.The official Csa2 (comp.sys.apple2) Usenet newsgroup Apple II FAQs originate
  from the Ground Apple II site. Ground Apple II administrator: Steve Nelson

.Csa2 FAQs-on-Ground Resource file: R011SNDFMTS.htm
.                   ....
 
 

 AUDIO FILE FORMAT RESOURCE GUIDE (Version 1.1)

                  by Dave Huizing
 
 

1 TABLE OF CONTENTS

2 GENERAL INFORMATION
2.1 Foreword
2.2 Printed Version
2.3 Copyrights
2.4 Disclaimer
2.5 Contributrors
3 TX WAVE FORMAT
4 YAMAHA TYPHOON WAVE FILE FORMAT
4.1 DWVW v1.2 compression
4.2 DWVW sample delta bit frame
5 D009
5.1 The D00 header
5.2 The Instrument data
5.3 The SpFX data
5.4 The Arrangement data
5.5 The Sequence data
6 MIDI SAMPLE DUMP STANDARD
6.1 INTRODUCTION
6.2 SPEC: SAMPLE DUMP FORMATS
6.3 SPEC: SAMPLE DUMP MESSAGES
6.4 HANDSHAKING MESSAGES:
6.5 DUMP PROCEDURE: MASTER (DUMP SOURCE)
6.6 DUMP PROCEDURE: SLAVE (DUMP DESTINATION)
6.7 SDS OVERVIEW
7 ROL
7.1 Structure of .ROL files
7.2 Notes
8 8SVX
8.1 FORMblock [VHDR]
8.2 FORMblock [BODY]
9 AIFF
10 AU
11 FSM
12 GF1 PATCH
13 S3I
14 UWF
15 WAVE
15.1 RiffBLOCK [data]
15.2 RiffBLOCK [fmt ]
15.3 RiffBLOCK [loop]
16 ZYXEL
17 CREATIVE LABS FILE FORMATS
17.1 Sound Blaster Instrument File Format (SBI)
17.2 Creative Music File Format (CMF)
17.3 The CMF Instrument Block
17.4 The CMF Music Block
17.5 Sound Blaster Instrument Bank File Format (IBK)
18 CREATIVE VOICE (VOC) FILE FORMAT
19 REVISION HISTORY
 

2 General information

2.1 Foreword

I started to compile this document after I thought there was a need for
it. By surfing all around the web I collected these descriptions and
brought them to this document.I have planed to keep this document updated
so if theres any file format description thats not in this document, or
you have any comments on this document please send me an email message at:
stallion@worldonline.nl.

Happy developping,

Dave Huizing
 
 

2.2 Printed Version

If you need a printed version send an email.
 

2.3 Copyrights

Only the title and the compilation is copyrighted by Dave Huizing. As far
as I know all this information is free for use. See the disclaimer part
for more details. All trademarks, technical information and file
extensions belong to their respectfull owners.
 

2.4 Disclaimer

This document is provided on a as is base. The information has been
verified as far as possible, but I cannot be held responsible for any
problems caused by use or misuse of the information. All due I think I
wont happen I am also not responseble for any damage to any knid of
computer system after or while
using parts form this documentation. Use this document on your own risk.
 

2.5 Contributrors

Dave Huizing, stallion@worldonline.nl
DJ, Producer, DTP designer, etc

muki pakesch, mpakesch@t0.or.at
Maintainer of the TX16W mailinglist

Markus - Jvnsson , f93-maj@nada.kth.se
Author of the Awave sample convertor
 

3 TX Wave Format

The file consists of a 32 byte header followed by the actual waveform (the
first 16 bytes only identifies the file type).  In C syntax the header
would look like this:

char filetype[6] = "LM8953"
 

char nulls[10]
 

char dummy_aeg[6]
space for the AEG (never mind this

char format
0x49 = looped, 0xC9 = non-looped

char sample_rate
1 = 33 kHz, 2 = 50 kHz, 3 = 16 kHz

char atc_length[3]
I'll get to this...

char rpt_length[3]
 

char unused[2]
set these to null, to be on the safe side
 

The "atc_length" and "rpt_length" fields are quite complex.  First of all
you should know that there is no such thing as a looping point in a TX
wave. Instead a wave is split into two parts, the attack part and the
repeat part (of course the actual wave data isn't split, this is just a
logical definition).  As you might guess, the attack part is played first
and the repeat part is looped until the key is released. Each of these
parts are limited to a maximum of 128k words in length. That is the reason
why waves can't be longer than 256k words (4096 blocks).

The length of a part is stored LSB first (Intel).  And only the least
significant _bit_ of the third byte (bit 0) is used (representing the most
significant bit of the length). Are you confused yet?  Then hold your
breath. It seems that Yamaha has chosen to squeeze in the sample rate(!)
of the wave in the unused _bits_ of these last bytes.  Although they
already have a separate byte for the sample rate, this isn't enough.  I
won't go into details on this now (or you would be even more confused).
You only need to know that the possible values are:

  0x06, 0x52 = 33 kHz
  0x10, 0x00 = 50 kHz
  0xF6, 0x52 = 16 kHz

(The first value is located in byte three of "atc_length" and the second
value is located in byte three of "rpt_length".) To wrap it up, this is
the format of the two length fields on a bit level:

[0]
[1]
[2]

atc_length
AAAAAAAA
BBBBBBBB
DDDDDDDC

rpt_length
EEEEEEEE
FFFFFFFF
HHHHHHHG
 

A
 LSB of the attack length

B
 MSB of the attack length (except for one bit)

C
 the utterly most significant _bit_ of the attack length

D
 the first value of the magic sample rate constant (0x06, 0x10 or 0xF6)

E
 LSB of the repeat length

F
 MSB of the repeat length (except for one bit)

G
 the utterly most significant _bit_ of the repeat length

H
 the second value of the magic sample rate constant (0x52, 0x00)
 

Now for the most important (and probably most interesting) part. The
waveform data.  As you certainly know the TX uses 12-bit sampling
resolution, and this requires some kind of encoding if we are not willing
to waste one fourth of our disk space.  Yamaha has chosen to group the
samples two by two, making three bytes of data in the file for each pair.
I'll illustrate this on a bit level (as with the lengths above):
 

AA CD BB

A
 MSB of the first sample

B
 MSB of the second sample

C
 least significant nybble (oh, is that the correct spelling?) of the first
sample

D
 least signiticant nybble of the second sample
 

4 Yamaha Typhoon wave file format

This specification describes the compression algorithm for Typhoon format
waves. It does not cover the file format, which is AIFF-C. The
documentation for AIFF-C is available at the site ftp.sgi.com in the
directory /sgi/aiff-c.9.26.91.ps.Z (compressed Postscript file).

4.1 DWVW v1.2 compression

DWVW was invented 1991 by Magnus Lidstrom and is copyright 1993 by NuEdge
Development. You have the right to use the algorithm freely as long as you
make no false claims on its origin. DWVW is a lossless (or bit faithful)
compression method for digital audio data. Lossless means that the exact
original data will be preserved when compressing and decompressing.

The compression utilize the fact that the delta between the sample points
is generally less than the full dynamic width. Each sample point is
subtracted from the previous one and the difference is enthropy encoded in
a special format. Therefore the compression works best on low frequency
sounds with low noise ratio, where the difference between each sample is
small.

DWVW can be applied on samples of any bit resolution and with any number
of channels. As opposed to AIFF standard, sample bits are not "left
justified". Instead the necessary translation should be done when
decompressing. Also, while AIFF interleaves multichannel sounds, DWVW
doesn't as this complicates compression and decompression.

Each channel follows one another with only a slight break in the bit run.
The first delta for each channel should be put at an even 16-bit word
position. The encoding stores the delta points with only as many bits as
is required (hence the name "variable word width").

Thus, the number of bits used by each delta has to be stored as well.
Since this count varies very little we apply a (simpler) delta encoding on
this information.

To wrap it up, each compressed sample point consists of two values: the
delta from the last sample and the difference in word width of this delta
from the last delta (hereby referred to as "the WWM" - the word width
modifier).

Even though the word width modifier is stored first in each delta frame we
will describe the delta information first. The delta is always stored as
an absolute difference (i.e. unsigned) in a varible number of bits. An
extra bit follows that tells the sign (if the delta isn't zero). The
number of bits required for the delta (i.e. the word width) is decided by
the position of the most significant high bit in the absolut value. One
bit less than this is actually stored since the first bit is always high.

For instance, the delta 11 (binary 1011) has a required word width of four
bits ,but only the least significant three bits are stored. A zero delta
will have a zero word width and consequently requires neither delta bits
nor sign bit. A delta of one will require only a sign bit.

One special case requires attention. A normal two's complement number's
lowest negative number is one less than the highest positive number.
Treating zero as a positive value this gives exactly as many negative as
positive numbers. The delta encoding on the other hand does not consider
zero to be of any sign and does therefore not include the one extra
negative value. If this value is encountered in the delta stream it is
encoded as one greater than it actually is (putting it within the
expressable range of values).

To distinguish it from the next lowest value one extra bit is inserted
after the sign bit. The bit is high for the lowest value and low for the
next lowest value.

For example, a 16-bit two's complement number can be -32768. It would be
encoded as negative 32767 with an extra high bit. The value - 32767 would
also be encoded as negative 32767 but with the extra bit low. Of course,
only these two values require the extra bit.

The WWM preceeds the delta bits. It is encoded as a series of low bits (0)
terminated by a high bit (1) (in most cases). The count of low bits tells
the modifier amount. If the modifier isn't zero an extra bit follows that
tells the modifier sign. A high bit means negative modifier. Word width
"wraps" at the used bit resolution (new-width =3D (original-width +
modifier) modula bit- resolution).

This enables us to go from a small width to a large width by using a
negative modifier. Because of this fact a WWM will never need to be larger
than the sound bit resolution divided by two (rounded downwards). If the
modifier is the maximum the terminating high bit would be superfluous, so
in this case it isn't inserted. (However; the sign bit is always included,
even if the bit resolution is even.)

For encoding the current word width and sample value should be initially
reset to zero for each channel (the first delta will thus be the sample
value). A compressed channel always starts on an even 16-bit word
boundary. Notice that the highest possible compression ratio is eight
times, i.e. one bit per sample. This occurs when the source is continous
series of zero samples.
 

4.2 DWVW sample delta bit frame:

0...
WWM is the count of low bits (can be none)

1
terminating high bit (if not max W=WM)

ms
WWM sign, high is negative (only on non-zero WWM)

delta
(word width - 1) sample delta bits (if delta  1)

sb
delta sign bit (only on non-zero delta)

xb
extra bit (only on lowest and next lowest possible delta value)
 

Some encoding examples (the examples all represent extreme situations with
unusually poor
compression):
Bit resolution
 16

Delta
 923 (bin 00000011 10011011=)

Current width
 1

New width
 10

Modifier
 -7 (mod 16 =3D 10)

Yields
 0000000 1 1 110011011 0
 

Bit resolution
 12

Delta
 -2048 (bin 1000 00000000)

Current width
 0

New width
 11

Modifier
 -1 (mod 12 =3D 11)

Yields
 0 1 1 1111111111 1 1
(-2048 is encoded as 2047 with extra bit and negative high)
 

Bit resolution
 8

Delta
 -12 (bin 11110100, negated 00001100)

Current width
 0

New width
 4

Modifier
 +4

Yields
 0000 0 100 1 (no terminating bit for WWM)
 

5 D00

This part describes the D00 music format (used by the AdLib player v4.01
coded by JCH/Vibrants) in more detail than the docs of EdLib (the
respective tracker, also coded by JCH) do. This document assumes that you
already own EdLib and have some experience with it. Also, the availability
of the EdLib docs as well as of the docs for the player included with
EdLib is assumed. You should know some basics about AdLib programming and
data formats (byte, word etc.) as well as the EdLib structures
(Instruments, SpFX etc.) and with hexadecimal notation.
 

5.1 The D00 header

A description of the D00 header can be found in the player's docs. So I
won't show it again here. But JCH gives very cryptic names to the other
file structures, so I'll call them differently:

JCH's names
 My names

TPoin tables
 Arrangment data

SeqPointer tables
 Sequence data

Instrument data
 Instrument data

DataInfo text
 Song description

Special tables
 SpFX data
 

Also, I should mention that all the pointers to these tables are meant
relative to the beginning of the D00 file.
 

5.2 The Instrument data

The instrument data simply consists of all instruments used in the song.
Since the number of instruments is stored nowhere inside the file, loaders
should the start offset of the next structure for determining if they have
read enough data. The data for each instrument consists of 16 bytes, which
occur in the same order as the corresponding bytes in the EdLib Instrument
table:

 xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx
    +------------+ +------------+ &  &  &  &  &  &
     Carrier data  Modulator data &  &  &  &  +---Unused
                                  &  &  &  +Hard restart SR value
                                  &  &  +Hard restart timer
                                  &  +Fine-tune
                                  +AM/FM + Feedback

For the exact meaning of these bytes, read the EdLib manual. Note that in
the Carrier and Modulator data the ADSR parts are not stored
word-oriented, but byte-oriented. That means, they aren't stored as a word
whose High byte is the AD part and whose Low byte is the SR part (although
the display in EdLib creates that assumption).

Instead they're simply stored as two bytes of which the first one's the AD
part and the second one's the SR part. 5.3 The SpFX data The SpFX data ist
stored more or less like the Instrument data, but one single table entry
consists of only 8 bytes arranged like this:

 xxxx xx xx xx xx xxxx (note xx's are BYTES and xxxx's are WORDS!)
    &    &  &  &  &  &
    &    &  &  &  &  +Pointer to next SpFX entry
    &    &  &  &  +Duration of SpFX entry in Frames
    &    &  &  +Modulator Level add
    &    &  +New Modulator level
    &    +Note add value
    +Instrument to use

Again, to really understand the meaning of these parts, you should read
the EdLib docs.
 

5.4 The Arrangement data

The arrangement data determines which sequence is to be played on which
channel at which moment and in which way, if you understand what I mean :)
It consists of two parts: The Pointer part and the Data part (I simply
call them that way now :). The Pointer part consists of 16 word pointers
and one endmark (all endmarks are FFFFh, by the way). Only the first nine
pointers are used at the moment: one for each one of the nine AdLib
channels. Each one of these nine pointers points to the part of the Data
part which belongs to its channel. The Data part consists, as you'd have
guessed before, of nine independent arrangement streams. Each one of tese
streams has the following format:

First comes a word telling the speed of that stream. Since this
information is stored at the beginning of EVERY stream, I assume that
every channel may have its own unique speed, and EdLib simply doesn't
support this.

After that, the real arrangement data is stored. This data is organized
like this: If a word below 8000h is read, it's the number of a sequence to
be played. In that case, the saved transpose data is used.

But if a word 8XYYh is read, with X and YY being any value, the transpose
data is updated to X and YY (see  the EdLib docs for information on the
meaning of X and YY).

I have found out that the first arrangement entry for an arrangement
stream that contains at least one sequence is always such a command to set
the
internal transpose data. So no default value is required to be loaded into
the transpose data before playing. And looping the arrangement stream
becomes easier.

If the word FFFFh is read, the arrangement stream has arrived at its
looping point. The word following the FFFFh is an offset into the
arrangement stream telling at which position the stream should be
restarted. If the word FFFEh is read, the arrangement stream has reached
its end. Unlike the Loop command (FFFFh), the stream mustn't get restarted
but halted. Also, there is no word following the FFFEh command.
 

5.5 The Sequence data

The Sequence data again consists of a pointer part and a data part. But
this time these two parts aren't stored in different parts of the file,
the data part is stored directly after the pointer part. Therefore, a
reference to a specific pattern should be seen as a reference to a word
counted from the beginning of the Sequence data.

This word (e.g. the first word for Pattern 0000h) then points to the
offset of the actual sequence data inside the file. I hope you got my
point... Then, each sequence is stored as follows: Read a word. If it's
high byte is below 20h, then it's a note. Note that RESTs and HOLDs are
also counted as notes. In this case, the low byte can contain the
following values:

00h = REST
The high byte tells the number of rests to insert minus one! e.g. a REST
with a high byte of 01h means "Two RESTs"

01h - 7Dh = Note
The value of this note byte tells the amount of halfnotes to add to C-0
(e.g. 01h would mean C#0). In this case, the high byte tells the number of
HOLDs to insert after the note.

7Fh = HOLD
The high byte tells the number of HOLDs minus one again!

If the high byte is 20h or above, but below 40h, it's a note again, but
this time with Tienote switched on. The high word is used as repetition
count again, but don't forget to substract 20h before evaluating it!!
If the high bzte is 40h or above, it's an effect. In this case, the
complete word can simply be interpreted like any EdLib effect (set
instrument, set volume etc.). See the EdLib docs for a list of them.

The note word this effect refers to follows directly after the ceffect
word.
If the read word is FFFFh, it indicates the end of that sequence. In that
case, the next sequence to be played should be determined and loaded and
the first effect/note of it should be played.
 

6 MIDI SAMPLE DUMP STANDARD

6.1 INTRODUCTION

The  MIDI  SDS was  adopted  in  January  1986   by   the   MIDI
Manufacturers Association  and the Japanese MIDI Standards Committee. The
SDS defines the standard method for transfer of sound sample  data
between  MIDI-equipped devices.  Sample dumps may be accomplished with
either an 'open loop' or 'closed loop' system.

The open loop method simply involves the straight dump of all sample data
from its source to the destination, with no timeouts, packet
acknowledgements, or any other form of handshaking, much as in the manner
of a sysex bulk dump, usually intiated at the source.

The closed loop method allows the use of handshaking messages between the
dump source and  destination,  and usually  places  the  dump process
under the control of the slave,  to allow it time to process the incoming
data as necessary.  As with  any standard, it can not be assumed that a
device adheres to it unless the accompanying documentation specifically
indicates it. Even then, it is best to check its conformity with
non-critical data.
 

6.2 SPEC: SAMPLE DUMP FORMATS

DUMP HEADER: F0 7E cc 01 ss ss ee ff ff ff gg gg gg hh hh hh ii ii ii jj
F7

cc
channel number

ss ss
sample number (LSB first)

ee
sample format (number of significant bits; 8->28)

ff ff ff
sample period (1/sample rate) in nanoseconds (LSB first)

gg gg gg
sample length, in words

hh hh hh
sustain loop start point (word number) (LSB first)

ii ii ii
sustain loop end point (word number) (LSB first)

jj
loop type (00:forwards only; 01:alternating)
 
 

DATA PACKET: F0 7E cc 02 kk <120 bytes> mm F7

cc
channel number

kk
running packet count (00->7F)

mm
checksum (XOR of 7E, cc, 02, kk <120 bytes>)
 

The  total  size of a data packet is 127 bytes.  This is to avoid overflow
of the MIDI input buffer of a device that may want to receive an entire
packet before processing it. A data packet consists of its own header,  a
packet number,  120 bytes of data, a checksum, and an EOX.  The packet
number begins at 00 and increments with each new packet.  It resets to 00
after it reaches 7F, and continues counting.

The packet number is used by the receiver to distinguish between a new
data packet,  or a resend of  a  previous packet. The packet number is
followed by 120 bytes of data, which form 60,  40,  or 30 words (MSB first
for multiword samples),  depending on the length of a single data sample.
Each data byte hold seven bits,  with the msb in each byte set to 0,  in
order to conform to the requirements of MIDI data transmission.
Information is left justified within the 7-bit bytes,  and unused bits are
filled with 0. Example:  Assume  a data point in the memory of a 16-bit
sampler, with the value 87E5. In binary, that would be:

1000 0111 1110 0101

and would be encoded as the following MIDI data stream:

01000011 01111001 00100000

The checksum is the running XOR of all the data after  the  SYSEX byte, up
to but not including the checksum itself.
 

6.3 SPEC: SAMPLE DUMP MESSAGES

DUMP REQUEST: F0 7E cc 03 ss ss F7

cc
     channel number

ss ss
     sample number requested (LSB first)

Upon receiving the request,  the sampler checks the sample number to see
if it is within legal range.  If it  is  not,  the  request  is ignored.
If it is, the sample dump is started. One packet at a time is sent, under
control of the handshaking messages outlined below.
 

6.4 HANDSHAKING MESSAGES:

For all below:
cc
channel number

pp
packet number

Packet  numbers  are  included  in  the  handshaking  messages to
accomodate machines that have the intelligence to re-transmit specific
packets after an entire dump is finished,  or  if  synchronization  is
lost.

ACK
F0 7E cc 7F pp F7

Means  last  packet  was  recieved correctly (checksum OK,  etc), please
send next one.  Packet number is packet being  acknowledged  as correct.

NAK
F0 7E cc 7E pp F7

Means  last  packet  not  received correctly,  please send again. Packet
number is packet being rejected.

CANCEL
F0 7E cc 7D pp F7

Means abort dump immediately.  Packet number is packet  on  which abort
occurs.

WAIT
F0 7E cc 7C pp F7

Means pause dump indefinitely, until next message is sent. Allows the
unit recieving the dump to perform other functions (disk access, etc),
before receiving the remainder of the dump.  The next message it sends (eg
ACK, ABORT) will determine if the dump continues or aborts.
 

6.5 DUMP PROCEDURE: MASTER (DUMP SOURCE)

Once a dump has been requested,  either via MIDI or through the front
panel, the DUMP HEADER is sent.

After sending the header,  the master  must time out for at least two
seconds, to allow the receiver to decide if it will accept this sample
(has enough memory, etc).If it receives a  CANCEL,  within  this  time,
it  should  abort immediately.

If  it  receives  an CAK, it will start sending packets immediately. If it
receives a WAIT, it pauses until another message is received,  and then
processes that  mesage  normally.  If  nothing  is recieved  within  the
timeout, an open loop is assumed, and the dump starts with the first
packet.
After sending each packet,  the master should  time  out  for  at least 20
milliseconds and watch its MIDI In.

If an ACK is received, it sends  the  next  packet immediately.  If it
receives an NAK,  and the packet number matches the number of the last
packet  sent,  it resend that  packet  If  the  packet  numbers don't
match,  and
the device is incapable of sending packets out of order, the NAK will be
ignored.

If a WAIT is received,  the master should watch its MIDI In  port
indefinitely for another ACK,  NAK, or CANCEL message, which it should
then process normally.

If no  messages  are  received  within  20  milliseconds  of  the
transmission of  a  packet,  the  master  may  assume  an  open  loop
configuration, and send the next packet.

This process continues until there are less than 121 data bytes to send.
The final packet will still consist of 120n bytes, regardless of  how
many significant bytes actually remain, and the unused bytes will be
filled  with zeroes.  The  receiver  should  handshake  after receiving
the last packet.
 

6.6 DUMP PROCEDURE: SLAVE (DUMP DESTINATION)

When receiving a sample dump, a device should keep a running checksum
during reception. If its checksum matches the checksum in the data
packet,  it will send an ACK and wait for the next packet.

If it does  not  match,  it  will  send  an NAK containing the number of
the packet that caused the error, and wait for the next packet. If, after
sending an NAK, the packet number of the next packet doesn't match the
previous  packet number (the one that was NAK'd),  and the unit is not
capable of accepting packets out of order,  the error is  ignored  and the
dump continues as if the checksums had matched.

If a receiver runs out of memory before the dumpo is completed, it should
send a CANCEL to stop the dump.
 

6.7 SDS OVERVIEW

DUMP DATA FORMAT: DUMP HEADER

Sysex

ID: Universal Non-Real Time

Channel Number

Sub ID: Header

Sample Number (2 bytes, LSB first)

Sample Format

Sample Period (3 bytes, LSB first)

Sample Length (3 bytes, LSB first)

Sustain Loop Start Point (3 bytes, LSB first)

Sustain Loop End Point (3 bytes, LSB first)

Loop Type

Eox
 

SAMPLE DUMP DATA FORMAT: DATA PACKET

Sysex

ID: Universal Non-Real Time

Channel Number

Sub ID: Data Packet

Packet Number

Sample Data (120 bytes)

Checksum

Eox
 

SAMPLE DUMP MESSAGES: DUMP REQUEST

Sysex

ID: Universal Non-Real Time

Channel Number

Sub ID: Dump Request

Sample Number (2 bytes, LSB first)

Eox
 

SAMPLE DUMP MESSAGES: HANDSHAKING FLAGS:

Sysex

ID: Universal Non-Real Time

Channel Number

Sub ID: ACK or NAK or CANCEL or WAIT

Packet Number

Eox
 

7 ROL

This part contains details of .ROL files used by AdLib and compatible
cards on PC It is also used by Visual Composer (TM).

7.1 Structure of .ROL files:

fld #
size
(bytes)
type
description

1
2
int
file version, major

2
2
int
file version, minor

3
40
char
unused

4
2
int
ticks per beat

5
2
int
beats per measure

6
2
int
editing scale (Y axis)

7
2
int
editing scale (X axis)

8
1
char
unused

9
1
char
0 = percussive mode
1 = melodic mode

10
90
char
unused

11
38
char
filler

12
15
char
filler

13
4
float
basic tempo
 

Field 14 indicates the number of times to repeat fields 15 and 16:
fld #
size
type
description (bytes)

14
2
int
number of tempo events

15
2
int
time of events, in ticks

16
4
float
tempo multiplier (0.01 - 10.0)
 

The remaining fields (17 to 34) are to be repeated for each of 11 voices:
fld #
size
type
description (bytes)

17
15
char
filler

18
2
int
time (in ticks) of last note +1
 

Repeat the next two fields (19 and 20) while the summation of field 20 is
less than the value of field 18:
fld #
size
type
description (bytes)

19
2
int
note number: 0 => silence from 12 to 107 => normal note (you
must subtract 60 to obtain the correct value for the sound driver)

20
2
int
note duration, in ticks

21
15
char
filler
 

Field 22 indicates the number of times to repeat fields 23 to 26:
fld #
size
type
description (bytes)

22
2
int
number of instrument events

23
2
int
time of events, in ticks

24
9
char
instrument name

25
1
char
filler

26
2
int
unused

27
15
char
filler
 

Field 28 indicates the number of times to repeat fields 29 and 30:
fld #
size
type
description (bytes)

28
2
int
number of volume events

29
2
int
time of events, in ticks

30
4
float
volume multiplier (0.0 - 1.0)

31
15
char
filler
 

Field 32 indicates the number of times to repeat fields 33 and 34:
fld #
size
type
description (bytes)

32
2
int
number of pitch events

33
2
int
time of events, in ticks

34
4
float
pitch variation (0.0 - 2.0, nominal is 1.0)
 

7.2 Notes
Fields #1 and #2 should be set to 0 and 4 respectively. Field #10 should
be filled with zeros.
 

8 8SVX

The 8SVX files are IFF files used for digital audio data. The format of
the VHDR block is complete guesswork. These files use Motorola byte order.
The 8SVX file format is fixed to 8-bit mono sample data - at least
GoldWave does not support saving files in any other format than 8-bit
mono.
 

8.1 FORMblock [VHDR]

This is the sample information block. The normal size is 20 bytes.

OFFSET
Count
TYPE
Description

0000h
1
dword
Sampling rate of digital data in Hz. This count seems not to
be too accurate, at least GoldWave v2.0 creates different
rates for Wave and 8SVX files.

0004h
4
dword
Other data, unknown
 

8.2 FORMblock [BODY]

This block contains the raw sample data, maybe the usual IFF compression
was used. The details of both the compression and the information about
the IFF format are unknow.
 

9 AIFF

The Audio Interchangeable File Format files are digital audio files stored
in the IFF format; the samples are stored in signed PCM. The header block
is [AIFF], different subblocks are :

[AUTH]
The authors information optional

[COMM]
This record stores information about the sampled data
 

OFFSET
Count
TYPE
Description

0000h
1
word
number of channels or number of instrument samples ???

0002h
1
dword
Sample length

0006h
1
dword
lower frequency

000Ah
1
dword
maximum frequency

000Dh
1
dword
???
 

[MARK]
 

[NAME]
The name of the instrument / sample

[SSND]
The stored sample data.
 

10 AU

The AU files are digital audio files used by the Sun and NeXT
workstations. Further information wanted.

OFFSET
Count
TYPE
Description

0000h
4
char
ID='.snd'

0004h
1
dword
Offset of start of sample

0008h
1
dword
Length of stored sample

000Ch
1
dword
Sound encoding :
1 -  8-bit ISDN u-law,
2 -  8-bit linear PCM (REF-PCM),
3 - 16-bit linear PCM,
4 - 24-bit linear PCM,
5 - 32-bit linear PCM,
6 - 32-bit IEEE floating point,
7 - 64-bit IEEE floating point,
23 - 8-bit ISDN u-law compressed(G.721 ADPCM)

0010h
1
dword
Sampling rate

0014h
1
dword
Number of sample channels
 

11 FSM

The .FSM files are samples to be used for module style music with the
Fandarole Composer. Currently only samples of up to 64K length are
supported, altough the header reserves a dword for the sample size.

OFFSET
Count
TYPE
Description

0000h
4
char
ID='FSM',254

0004h
32
char
ASCII name of sample

0024h
3
char
ID=10,13,26

0027h
1
dword
Length of sample (<=64K)

0028h
1
byte
Fine tune value for sample (currently unsupported)

0029h
1
byte
Sample volume (currently unsupported)

002Ah
1
dword
Start of sample loop

002Dh
1
dword
End of sample loop. If the sample is not set to loop (see below)
this should be set to the end of the sample.

0032h
1
byte
Sample type bitmapped
0 - 8-bit/16-bit sample
1-7 - reserved

0033h
1
byte
Loop mode ?bit mapped?
0-2 - reserved
3 - loop off/loop on
4-7 - reserved

0034h
?
byte
Sample data in signed format
 

12 GF1 PATCH

The GF1 Patch files are multipart sound files for the Gravis Ultrasound
sound card to emulate MIDI sounds in high quality. Each Patch can consist
of many samples (for example, a string ensemble consists of Violin, Viola,
Cello, Bass) which are played depending on the note to play. A patch can
also
contain a part to be played before the loop and a part to be played after
the tone has been released.

OFFSET
Count
TYPE
Description

0000h
12
char
ID='GF1PATCH110'

000Ch
10
char
Manufacturer ID

0018h
60
char
Description of the contained Instruments or copyright of
manufacturer.

0054h
1
byte
Number of instruments in this patch

0055h
1
byte
Number of voices for sample

0056h
1
byte
Number of output channels (1=mono,2=stereo)

0057h
1
word
Number of waveforms

0059h
1
word
Master volume for all samples

005Bh
1
dword
Size of the following data

0060h
36
byte
reserved
 

Following this header, the instruments with their headers follow. An
instrument header contains the
name and other data about one instrument contained within the patch.
OFFSET
Count
TYPE
Description

0000h
1
word
Instrument number. ?Maybe the MIDI instrument number?. In the
Gravis patches, this is 0, in other patches, I found random values.

0002h
16
char
ASCII name of the instrument.

0012h
1
dword
Size of the whole instrument in bytes.

0016h
1
byte
Layers. Needed for whatever.

0017h
40
byte
reserved
 

About the patch, I don't know anything. Maybe somebody could enlighten me.
Each patch record has the following format :

OFFSET
Count
TYPE
Description

0000h
7
char
Wave file name

0007h
1
byte
Fractions

0008h
1
dword
Wave size. Size of the wave digital data

000Ch
1
dword
Start of wave loop

0010h
1
dword
End of wave loop

0012h
1
word
Sample rate of the wave

0014h
1
word
Minimum frequency to play the wave

0016h
1
word
Maximum frequency to play the wave

0018h
1
dword
Original sample rate of the wave data

001Ch
1
int
Fine tune value for the wave

001Eh
1
byte
Stereo balance, values unknown**

001Fh
6
byte
Filter envelope rate

0025h
6
byte
Filter envelope offse

002Bh
1
byte
Tremolo sweep

002Ch
1
byte
Tremolo rate

002Dh
1
byte
Tremolo depth

002Fh
1
byte
Vibrato sweep

0030h
1
byte
Vibrato rate

0031h
1
byte
Vibrato depth

0032h
1
byte
Wave data, bitmapped
0 - 8/16 bit wave data
1 - signed/unsigned data
2 - de/enable looping
3 - no/has bidirectional looping
4 - loop forward/backward
5 - Turn envelope sustaining off/on
6 - Dis/Enable filter envelope
7 - reserved

0033h
1
int
Frequency scale, whatever that means

0035h
1
word
Frequency scale factor

0037h
36
byte
Reserved
 

13 S3I

This is the Digiplayer/ST3.0 digital sample file format. The sample files
include information about the loop of the instrument. The AdLib
instruments have another format listed below.

OFFSET
Count
TYPE
Description

0000h
1
byte
ID=01h

0001h
12
char
DOS filename

000Dh
1
byte
reserved (0)

000Eh
1
word
Paragraph offset of the raw sample data from beginning of file.

0010h
1
dword
Sample length in bytes

0014h
1
dword
Start of sample loop

0018h
1
dword
End of sample loop

001Ch
1
byte
Playback volumne of sample

001Dh
1
byte
??? "DSK" what ever that means

001Eh
1
byte
Pack type
0 - unpacked
1 - DP30ADPCM 1

001Fh
1
byte
Flags (bitmapped)
0 - loop on/off
1 - stereo sample (length bytes for left channel,
    then another length bytes for right channel!)
2 - 16-Bit samples (in Intel byte order)

0020h
1
dword
C2 frequency

0024h
1
dword
reserved

0028h
1
word
reserved

002Ah
1
word
ID=512

002Ch
1
dword
?? Date of last modification ?? (see table 0009)

0030h
28
char
ASCIIZ Sample name

003Ch
4
char
ID='SCRS'

0040h
?
byte
Raw sample data
 

Here follows the AdLib instrument format for which I don't know the
extension:

OFFSET
Count
TYPE
Description

0000h
1
byte
Instrument type
2 - melodic instrument
3 - bass drum
4 - snare drum
5 - tom tom
6 - cymbal
7 - hihat

0001h
12
char
DOS file name

000Dh
3
byte
reserved

0010h
1
byte
Modulator description (bitmapped)
0-3 - frequency multiplier
4 -   scale envelope
5 -   sustain
6 -   pitch vibrato
7 -   volume vibrato

0011h
1
byte
Carrier description (same as modulator)

0012h
1
byte
Modulator miscellaneous (bitmapped)
0-5 - 63-volume
6 -   MSB of levelscale
7 -   LSB of levelscale

0013h
1
byte
Carrier description (same as modulator)

0014h
1
byte
Modulator attack / decay byte (bitmapped)
0-3 - Decay
4-7 - Attack

0015h
1
byte
Carrier description (same as modulator)

0016h
1
byte
Modulator sustain / release byte (bitmapped)
0-3 - Release count
4-7 - 15-Sustain

0017h
1
byte
Carrier description (same as modulator)

0018h
1
byte
Modulator wave select

0019h
1
byte
Carrier wave select

001Ah
1
byte
Modulator feedback byte (bitmapped)
0 -   additive synthesis on/off
1-7 - modulation feedback

001Bh
1
byte
reserved

001Ch
1
byte
Instrument playback volume

001Dh
1
byte
??? "DSK"

001Eh
1
word
reserved

0020h
1
dword
C2 frequency

0024h
12
byte
reserved

0030h
28
char
ASCIIZ Instrument name

004Ch
4
char
ID='SCRI'
 

14 UWF

The UWF files are sample files used by the UltraTracker. Further
information wanted.

OFFSET
Count
TYPE
Description

0000h
32
char
ASCIIZ sample name

0020h
1
char
ID=1Ah

0021h
1
char
ID=10h

0022h
5
char
ID='MUWFB'

0027h
1
char
ID=0

0028h
6
char
Length of sample as ASCII long integer

002Eh
1
word
Length of sample
 

15 WAVE

The Windows .WAV files are RIFF format files. Some programs expect the fmt
block right behind the RIFF header itself, so your programs should write
out this block as the first block in the RIFF file. The subblocks for the
wave files are:
 

15.1 RiffBLOCK [data]

This block contains the raw sample data. The necessary information for
playback is contained in the
[fmt ] block.
 

15.2 RiffBLOCK [fmt ]
This block contains the data necessary for playback of the sound files.
Note the blank after fmt.

OFFSET
Count
TYPE
Description

0000h
1
word
Format tag
1 = PCM (raw sample data)
2 etc. for APCDM, a-Law, u-Law ...

0002h
1
word
Channels (1=mono,2=stereo,...)

0004h
1
dword
Sampling rate

0008h
1
dword
Average bytes per second (=sampling rate*channels)

000Ch
1
word
Block alignment / reserved ??

000Eh
1
word
Bits per sample (8/12/16-bit samples)
 

15.3 RiffBLOCK [loop]

This block is for looped samples. Very few programs support this block,
but if your program changes the wave file, it should preserve any unknown
blocks.

OFFSET
Count
TYPE
Description

0000h
1
dword
Start of sample loop

0004h
1
dword
End of sample loop
 

16 ZyXEL

The ZyXEL Modems are capable of digitizing speech, the ZFAX software and
answering machine software like VoiceConnect store the sampled data in
those files. The Modems are capable of compressing the data down to 19.2k
CPS (ADPCM) and 9.6k CPS (CELP), the algorithms for the compression may be
found in the ZyxelVoc package by N. Igl, but as the firmware on the modems
changes, so might the compression algorithm. Playback on the modem is
always possible. Files are specified by the .ZVD and .ZYX extensions.

OFFSET
Count
TYPE
Description

0000h
5
char
ID='ZyXEL'

0005h
1
byte
02h, ??? format tag

0006h
4
byte
reserved

000Ah
1
word
Compression scheme
0 - CELP
1 - 2 bit ADPCM
2 - 3 bit ADPCM

000Ch
4
byte
reserved

0010h
?
????
Raw Data, The voice data is just the data received from U1496
Modem/Fax.
 
 

17 Creative Labs File Formats
 

17.1 Sound Blaster Instrument File Format (SBI)

The SBI format contains the register values for the FM chip to synthesize
an instrument.

Offset
Description

00h-03h
Contains id characters "SBI" followed by byte 1Ah

04h-23h
Instrument name, NULL terminated string

24h
Modulator Sound Characteristic (Mult, KSR, EG, VIB, AM)

25h
Carrier Sound Characteristic

26h
Modulator Scaling/Output Level

27h
Carrier Scaling/Output Level

28h
Modulator Attack/Delay

29h
Carrier Attack/Delay

2Ah
Modulator Sustain/Release

2Bh
Carrier Sustain/Release

2Ch
Modulator Wave Seelct

2Dh
Carrier Wave Select

2Eh
Feedback/Connection

2Fh-33h
Reserved
 

17.2 Creative Music File Format (CMF)

The CMF file format consists of 3 blocks: the header block, the instrument
block and the music block.

The CMF Header Block
Offset
Description

00h-03h
Contains id characters "CTMF"

04h-05h
CMF Format Version MSB = major version, lsb = minor version

06h-07h
File offset of the instrument block

08h-09h
File offset of the music block

0Ah-0Bh
Clock ticks per quarter note (one beat) default = 120

0Ch-0Dh
Clock ticks per second

0Eh-0Fh
File offset of the music title (0 = none)

10h-11h
File offset of the composer name (0 = none)

12h-13h
File offset of the remarks (0 = none)

14h-23h
Channel-In-Use Table

24h-25h
Number of instruments used

26h-27h
Basic Tempo

28h-?
Title, composer and remarks stored here
 

17.3 The CMF Instrument Block

The instrument block contains one 16 byte data structure for each
instrument in the piece. Each record is of the same format as bytes
24h-33h in the SBI file format.
 

17.4 The CMF Music Block

The music block adheres to the standard MIDI file format, and can have
from 1 to 16 instruments. The PC-GPE file MIDI.TXT contains more
information on this file format.

The music block consists of an alternating seqence of time and MIDI event
records:

dTime
MIDI Event
dTime
MIDI Event
dTime
MIDI Event
 ........

dTime (delta Time) is the amount of time before the following MIDI event.
MIDI Event is any MIDI channel message.

The CMF file format defines the following MIDI Control Change events:

Control No
Control Data

66h
1-127, used as markers in the music

67h
0 - melody mode, 1 = rhythm mode

68h
0-127, changes the pitch of all following notes upward by the given number
of 1/128
semitones

69h
0-127, changes the pitch of all following notes downward by the given
number of
1/128 semitones
 

In rhythm mode, the last five channels are allocated for the percussion
instruments:
Channel
Instrument

12h
Bass Drum

13h
Snare Drum

14h
Tom-Tom

15h
Top Cymbal

16h
High-hat Cymbal
 

17.5 Sound Blaster Instrument Bank File Format (IBK)

A bank file is a group of up to 128 instruments.

Offset
Description

00h-03h
Contains id characters "IBK" followed by byte 1Ah

04h-803h
Parameters for 128 instruments, 16 bytes for each instrument in the same
format
as bytes 24h-33h in the SBI format

804h-C83h
Instrument names for 128 instruments, 9 bytes for each instrument, each
name
must be null terminated
 

18 Creative Voice (VOC) file format

HEADER (bytes 00-19)
Series of DATA BLOCKS (bytes 1A+) [Must end w/ Terminator Block]

byte #
Description

00-12
"Creative Voice File"

13
1A (eof to abort printing of file)

14-15
Offset of first datablock in .voc file (std 1A 00 in Intel Notation)

16-17
Version number (minor,major) (VOC-HDR puts 0A 01)

18-19
2's Comp of Ver. # + 1234h (VOC-HDR puts 29 11)
 

Data Block:  TYPE(1-byte), SIZE(3-bytes), INFO(0+ bytes)
NOTE: Terminator Block is an exception -- it has only the TYPE byte.

TYPE
Description
Size (3-byte int)
Info

00
Terminator
(NONE)
(NONE)

01
Sound data
2+length of data
*

02
Sound continue
length of data
Voice Data

03
Silence
3
**

04
Marker
2
Marker# (2 bytes)

05
ASCII
length of string
null terminated string

06
Repeat
2
Count# (2 bytes)

07
End repeat
0
(NONE)

08
Extended
4
***
 

*Sound Info Format:
**Silence Info Format:

00     Sample Rate
00-01  Length of silence - 1

01     Compression Type
02     Sample Rate

02+    Voice Data
 

***Extended Info Format:

00-01
Time Constant:
Mono:   65536 - (256000000/sample_rate)
Stereo: 65536 - (25600000/(2*sample_rate))

02
Pack

03
Mode:
0 = mono
1 = stereo

Marker#
Driver keeps the most recent marker in a status byte

Count#
Number of repetitions + 1 Count# may be 1 to FFFE for 0 - FFFD
repetitions or FFFF for endless repetitions

Sample Rate
SR byte = 256-(1000000/sample_rate)

Length of silence
in units of sampling cycle

Compression Type
of voice data
8-bits=     0
4-bits =    1
2.6-bits  = 2
2-bits    = 3
Multi DAC = 3+(# of channels)
     [interesting this isn't in the developer's manual]
 

19 Revision History

Version 1.0 - First document containing 15 formats
Version 1.1 - 2 More formats added

.      .to.top