Bryan Parkoff <BParkoff@satx.rr.com> wrote:

> Clock Bits On Disk Question
> 
>     I read both Beneath Apple DOS and Beneath Apple ProDOS.  Chapter 3 of
> Beneath Apple DOS says that the head arm reads clock bit before it begins to
> read data.
>     Now, it looks conflict that Beneath Apple ProDOS says not to use clock
> bits.  It shows the first page of Appendix D -- The Logic State Sequencer.

The use of clock bits on Apple II 5.25" disks varies depending on the
type of data being accessed.

The address field for each sector is encoded using a 4-and-4 method,
where every eight raw bits on the disk encode four actual data bits.
The other four bits are clock bits (always "1").  This is the same
bit-level encoding as the FM technique.

The data field for each sector is encoded using a 6-and-2 method (or
5-and-3 for DOS 3.2.1 and earlier), which does not use clock bits.  For
6-and-2 encoding, every eight raw bits on the disk encode six actual
data bits, using a lookup table to convert from valid bit patterns on
disk to the appropriate data value (and vice versa).

The 6-and-2 name comes from the manner in which each data byte is broken
up before being written to disk: six of the bits are converted and
written as an 8-bit value to the disk, and the remaining two bits are
combined with the corresponding bits from two other bytes to form
another six bit value, which goes through the same conversion process
and is written as another 8-bit value to the disk.  The end result is
that three data bytes are written as four disk bytes (a 33% increase in
size).

The earlier 5-and-3 technique worked on the same principle, but only
used 32 distinct disk byte values instead of 64, which means that it
could only encode 5 data bits per 8 bit value written to the disk.  I
don't recall the exact method by which the other three bits are grouped
with other bytes, but the end result is that five data bytes are written
as eight disk bytes (a 60% increase in size).  This is still better than
the 4-and-4 method, which generates two disk bytes for every data byte
(a 100% increase in size).

If a single sided single density 5.25" disk was written using 4-and-4,
it would only be able to hold about 10 sectors per track, at 256 bytes
per sector (87.5 KB per disk, assuming 35 tracks).

The 5-and-3 method used by early versions of DOS allowed 13 sectors per
track (113.75 KB per disk).

The 6-and-2 method used by later versions of DOS (as well as ProDOS,
Pascal and CP/M) allows 16 sectors per track (140 KB per disk).

The reason that the address field uses 4-and-4 encoding is that address
fields need to be decoded very quickly (so that the correct data field
can be located for reading or writing a sector), and it is much faster
for software to decode 4-and-4 data than it is to decode 5-and-3 or
6-and-2.  (Decoding two 4-and-4 bytes involves two instructions: rotate
one of the bytes with carry set and AND them to produce the data byte.)
Not much space is wasted, because the address fields are very small
compared to the data fields.

>     It says that Disk II Drivers can write data bytes without using clock
> bits.  Please explain what it means.

The general principle of writing data to a floppy disk is that the disk
records "flux reversals", i.e. inversions in the magnetic field.  In the
raw data on the disk, a flux reversal is used to represent a "1" bit,
and the absence of a flux reversal is used to represent a "0" bit, with
fixed timing for each bit cell (4 microseconds for the single density
5.25" drive, 2 microseconds for double density 3.5" drive, 1 microsecond
for the high density 3.5" drive).

The problem is that the disk can reliably reproduce consecutive flux
reversals when read back, but prolonged absence of a flux reversal
produces unrelabile data readback.  In other words, a raw disk "1" bit
is 100% reliable, but there is a limit to the number of consecutive raw
disk "0" bits which can be read back again.  A single "0" bit
immediately after a "1" bit is OK.  If the disk drive hardware is good
enough quality, it should be possible to read two consecutive "0" bits
after a "1" bit (as long as they don't occur too close to other "00"
pairs), but three or more "0" bits is unreliable.

FM was the earliest solution to this.  It is typically used on single
density disks, e.g. the standard single density 8 inch floppy format
used by CP/M is encoded using FM.

With FM, data bits are interleaved with clock bits.  Clock bits are
always written as "1" and data bits may be either "0" or "1".  This
means that there is never more than a single "0" bit after a "1" bit,
and the disk will read back reliably.  (There are some special
exceptions, e.g. a clock bit written as a "0" is used as a timing
reference.)  Eight data bits will require sixteen bit-times of data to
be written to the disk, consisting of eight clock bits interleaved with
the eight data bits.

With the original GCR technique (5-and-3), a raw byte on disk is not
interpreted as clock and data bits.  Since the disk can reliably
reproduce single "0" bits, 5-and-3 allows "0" bits to appear in any of
the lower order seven bits of each byte on the disk, as long as they are
never adjacent.  Using this rule, there are at least 32 unique disk byte
values available, so a table lookup method can be used to encode 5 data
bits in 8 raw bits on the disk.

Here is an example of the comparison with 4-and-4 (FM).  The valid
4-and-4 codes are:

10101010
10101011
10101110
10101111
10111010
10111011
10111110
10111111
11101010
11101011
11101110
11101111
11111010
11111011
11111110
11111111

Note that the leftmost bit and every second subsequent bit is a "1".
These are the clock bits.

All of the above are valid 5-and-3 codes, but if we allow "0" bits to
appear in the third, fifth and seventh column while still avoiding two
consecutive "0" bits, the following codes are also valid:

10101101
10110101
10110110
10110111
11010101
11010110
11010111
11011010
11011011
11011101
11011110
11011111
11101101
11110101
11110110
11110111
11111101

(I think I got them all.)

There are 17 additional codes.  One of them (11010101 = D5) was reserved
for use as a unique byte to appear in a sector header, leaving 32 in
total, which is sufficient to encode 5 data bytes.

The later GCR technique (6-and-2) also allows two consecutive "0" bits
to appear in the byte, but only once, and not immediately after the
leading "1" bit (this ensures sufficient separation between pairs of "0"
bits).  I won't list out the values, but there are at least another 32
codes available, allowing 6 data bits to be encoded in 8 disk bits.

All of these techniques make no change to the timing of the bits on the
disk (four microseconds per cell).


MFM uses a different technique.  It is based on FM (using clock and data
bits), but the raw disk bits are written at twice the speed (2
microseconds per bit cell on a 5.25" double density disk).  The disk
isn't actually able to record flux reversals that close together, so MFM
writes a "0" instead of a "1" for a clock bit adjacent to a "1" data
bit.

The data written to disk looks like this, assuming the preceding data
bit was a "0":

Data                Disk
0000                10101010
0001                10101001
0010                10100100
0011                10100101
0100                10010010
0101                10010001
0110                10010100
0111                10010101
1000                01001010
1001                01001001
1010                01000100
1011                01000101
1100                01010010
1101                01010001
1110                01010100
1111                01010101

Note that in some cases there are three consecutive zero bits, but this
is only six microseconds on the disk, which is shorter than the two
consecutive zero bits written by GCR (eight microseconds).

MFM can store twice as much data on the same disk as FM, so it is better
than GCR's 6-and-2 method (but it requires more complex disk controller
hardware).

FM and MFM are reasonably simple to encode and decode, and this is
typically done in hardware using a dedicated floppy disk controller
chip.  The original chips of this type used in microcomputers were made
by Intel, and were very expensive.  Steve Wozniak's design of the Apple
II disk controller card was a major breakthrough because it was
considerably cheaper than a traditional disk controller, and it allowed
greater disk capacity than FM encoding (while not being as good as MFM).

GCR typically requires encoding and decoding to be done in software, so
it imposes more overhead on the host machine.  (It can be done in
hardware using complex logic circuitry, as was done in the IIc+, for
example.)

>     Do MFM disk have address field and data field like GCR disk?

Yes, though the details are somewhat different.  The sector formats used
on FM and MFM encoded 5.25" and 8" disks were established by IBM.
Apple's GCR sector format is based on the same principles.

> Where can I find more information how MFM disk is encoded.

You could try looking for data sheets on FM or MFM floppy disk
controller chips, or for documentation on the IBM floppy disk formats.

-- 
David Empson
dempson@actrix.gen.nz


David Empson wrote:

<snip>
Again, a lovely and complete description of disk encoding--this
should really find its way into the FAQ.  It certainly comes up
regularly.  ;-)

>The problem is that the disk can reliably reproduce consecutive flux
>reversals when read back, but prolonged absence of a flux reversal
>produces unrelabile data readback. 

It may be useful to say why the prolonged absence of flux reversals
is an issue.  Magnetic coatings are applied to the substrate by a
mechanical process that can produce nonuniform thickness and
properties.  As a result, the amplitude of the signal read back from
a disk may vary considerably with rotational position, and these
variations may be relatively short-term.  The drive electronics must
be able to cope with these rapid amplitude variations in order to
correctly recover the magnetic transitions.

The method of coping is to vary the read signal gain to try to keep
the readback levels approximately constant in the face of wide
variation in read signal levels.  This Automatic Gain Control works
by looking at the short-term average of the head signal amplitude.
If the amplitude drops, the AGC increases the gain to compensate,
with a time constant that depends on the anticipated data rates
and media characteristics.

In the Shugart drive which was the basis of the Apple Disk ][,
the AGC time constant was chosen so that the gain would rise
or fall significantly within several bit cells.  As a result, when there
are no magnetic transitions for three or four bit times, or whenever
the average number of transitions within a half dozen bit times falls
below average, the AGC turns up the gain enough that read signal
noise can begin to look like an actual transition signal--a false 1.

This is why both the number of consecutive 0 bits and the frequency
of occurance of multiple 0 bits must be limited to ensure reliable
recovery of the transitions written to the disk.  The AGC must be
kept supplied with a signal so that it keeps the gain properly
adjusted.

There is a design tradeoff between the quality of the anticipated
disk media and the time constant of the AGC.  The choices that
Shugart Associates made were appropriate for the rather low
uniformity of media pravalent in the mid-70s, but coating technology
and composition have improved somewhat since then.

It might be interesting to increase the capacitance of the AGC filter
on the Disk ][ analog card, which would make it less tolerant of
short-term read level fluctuations, but more stable in dealing with
longer strings of 0 bits.  The controller would still need the high bit
set as a "start" bit, but the remaining seven bits might be usable
without further encoding--providing a 7-and-1 encoding scheme. ;-)

-michael

Check out 8-bit Apple sound that will amaze you on my
Home page:  http://members.aol.com/MJMahon/