Audio File Formats Guide Csa2 FAQs-on-Ground resource file: R011SNDFMTS.TXT The Csa2 (comp.sys.apple2) usenet newsgroup Frequently Asked Questions files are compiled by the Ground Apple II site, 1997, 1998. ftp://ground.ecn.uiowa.edu/2/apple2/Faqs http://ground.ecn.uiowa.edu/2/apple2/Faqs The Csa2 FAQs may be freely distributed. Notes: This is a pure Text file which has no Font, Color, etc. formatting. For best viewing on-line, set browser Word Wrap to ON or copy to your favorite Text viewer and set Word Wrap. Ex: On PC use WordPad with Options set to "Wrap to Window". To correctly view tables and diagrams on a super-res display, use a mono-spaced Font such as CoPilot or PCMononspaced. ____________________________ AUDIO FILE FORMAT RESOURCE GUIDE (Version 1.1) by Dave Huizing 1 TABLE OF CONTENTS 2 GENERAL INFORMATION 2.1 Foreword 2.2 Printed Version 2.3 Copyrights 2.4 Disclaimer 2.5 Contributrors 3 TX WAVE FORMAT 4 YAMAHA TYPHOON WAVE FILE FORMAT 4.1 DWVW v1.2 compression 4.2 DWVW sample delta bit frame 5 D009 5.1 The D00 header 5.2 The Instrument data 5.3 The SpFX data 5.4 The Arrangement data 5.5 The Sequence data 6 MIDI SAMPLE DUMP STANDARD 6.1 INTRODUCTION 6.2 SPEC: SAMPLE DUMP FORMATS 6.3 SPEC: SAMPLE DUMP MESSAGES 6.4 HANDSHAKING MESSAGES: 6.5 DUMP PROCEDURE: MASTER (DUMP SOURCE) 6.6 DUMP PROCEDURE: SLAVE (DUMP DESTINATION) 6.7 SDS OVERVIEW 7 ROL 7.1 Structure of .ROL files 7.2 Notes 8 8SVX 8.1 FORMblock [VHDR] 8.2 FORMblock [BODY] 9 AIFF 10 AU 11 FSM 12 GF1 PATCH 13 S3I 14 UWF 15 WAVE 15.1 RiffBLOCK [data] 15.2 RiffBLOCK [fmt ] 15.3 RiffBLOCK [loop] 16 ZYXEL 17 CREATIVE LABS FILE FORMATS 17.1 Sound Blaster Instrument File Format (SBI) 17.2 Creative Music File Format (CMF) 17.3 The CMF Instrument Block 17.4 The CMF Music Block 17.5 Sound Blaster Instrument Bank File Format (IBK) 18 CREATIVE VOICE (VOC) FILE FORMAT 19 REVISION HISTORY 2 General information 2.1 Foreword I started to compile this document after I thought there was a need for it. By surfing all around the web I collected these descriptions and brought them to this document.I have planed to keep this document updated so if theres any file format description thats not in this document, or you have any comments on this document please send me an email message at: stallion@worldonline.nl. Happy developping, Dave Huizing 2.2 Printed Version If you need a printed version send an email. 2.3 Copyrights Only the title and the compilation is copyrighted by Dave Huizing. As far as I know all this information is free for use. See the disclaimer part for more details. All trademarks, technical information and file extensions belong to their respectfull owners. 2.4 Disclaimer This document is provided on a as is base. The information has been verified as far as possible, but I cannot be held responsible for any problems caused by use or misuse of the information. All due I think I wont happen I am also not responseble for any damage to any knid of computer system after or while using parts form this documentation. Use this document on your own risk. 2.5 Contributrors Dave Huizing, stallion@worldonline.nl DJ, Producer, DTP designer, etc muki pakesch, mpakesch@t0.or.at Maintainer of the TX16W mailinglist Markus - Jvnsson , f93-maj@nada.kth.se Author of the Awave sample convertor 3 TX Wave Format The file consists of a 32 byte header followed by the actual waveform (the first 16 bytes only identifies the file type). In C syntax the header would look like this: char filetype[6] = "LM8953" char nulls[10] char dummy_aeg[6] space for the AEG (never mind this char format 0x49 = looped, 0xC9 = non-looped char sample_rate 1 = 33 kHz, 2 = 50 kHz, 3 = 16 kHz char atc_length[3] I'll get to this... char rpt_length[3] char unused[2] set these to null, to be on the safe side The "atc_length" and "rpt_length" fields are quite complex. First of all you should know that there is no such thing as a looping point in a TX wave. Instead a wave is split into two parts, the attack part and the repeat part (of course the actual wave data isn't split, this is just a logical definition). As you might guess, the attack part is played first and the repeat part is looped until the key is released. Each of these parts are limited to a maximum of 128k words in length. That is the reason why waves can't be longer than 256k words (4096 blocks). The length of a part is stored LSB first (Intel). And only the least significant _bit_ of the third byte (bit 0) is used (representing the most significant bit of the length). Are you confused yet? Then hold your breath. It seems that Yamaha has chosen to squeeze in the sample rate(!) of the wave in the unused _bits_ of these last bytes. Although they already have a separate byte for the sample rate, this isn't enough. I won't go into details on this now (or you would be even more confused). You only need to know that the possible values are: 0x06, 0x52 = 33 kHz 0x10, 0x00 = 50 kHz 0xF6, 0x52 = 16 kHz (The first value is located in byte three of "atc_length" and the second value is located in byte three of "rpt_length".) To wrap it up, this is the format of the two length fields on a bit level: [0] [1] [2] atc_length AAAAAAAA BBBBBBBB DDDDDDDC rpt_length EEEEEEEE FFFFFFFF HHHHHHHG A LSB of the attack length B MSB of the attack length (except for one bit) C the utterly most significant _bit_ of the attack length D the first value of the magic sample rate constant (0x06, 0x10 or 0xF6) E LSB of the repeat length F MSB of the repeat length (except for one bit) G the utterly most significant _bit_ of the repeat length H the second value of the magic sample rate constant (0x52, 0x00) Now for the most important (and probably most interesting) part. The waveform data. As you certainly know the TX uses 12-bit sampling resolution, and this requires some kind of encoding if we are not willing to waste one fourth of our disk space. Yamaha has chosen to group the samples two by two, making three bytes of data in the file for each pair. I'll illustrate this on a bit level (as with the lengths above): AA CD BB A MSB of the first sample B MSB of the second sample C least significant nybble (oh, is that the correct spelling?) of the first sample D least signiticant nybble of the second sample 4 Yamaha Typhoon wave file format This specification describes the compression algorithm for Typhoon format waves. It does not cover the file format, which is AIFF-C. The documentation for AIFF-C is available at the site ftp.sgi.com in the directory /sgi/aiff-c.9.26.91.ps.Z (compressed Postscript file). 4.1 DWVW v1.2 compression DWVW was invented 1991 by Magnus Lidstrom and is copyright 1993 by NuEdge Development. You have the right to use the algorithm freely as long as you make no false claims on its origin. DWVW is a lossless (or bit faithful) compression method for digital audio data. Lossless means that the exact original data will be preserved when compressing and decompressing. The compression utilize the fact that the delta between the sample points is generally less than the full dynamic width. Each sample point is subtracted from the previous one and the difference is enthropy encoded in a special format. Therefore the compression works best on low frequency sounds with low noise ratio, where the difference between each sample is small. DWVW can be applied on samples of any bit resolution and with any number of channels. As opposed to AIFF standard, sample bits are not "left justified". Instead the necessary translation should be done when decompressing. Also, while AIFF interleaves multichannel sounds, DWVW doesn't as this complicates compression and decompression. Each channel follows one another with only a slight break in the bit run. The first delta for each channel should be put at an even 16-bit word position. The encoding stores the delta points with only as many bits as is required (hence the name "variable word width"). Thus, the number of bits used by each delta has to be stored as well. Since this count varies very little we apply a (simpler) delta encoding on this information. To wrap it up, each compressed sample point consists of two values: the delta from the last sample and the difference in word width of this delta from the last delta (hereby referred to as "the WWM" - the word width modifier). Even though the word width modifier is stored first in each delta frame we will describe the delta information first. The delta is always stored as an absolute difference (i.e. unsigned) in a varible number of bits. An extra bit follows that tells the sign (if the delta isn't zero). The number of bits required for the delta (i.e. the word width) is decided by the position of the most significant high bit in the absolut value. One bit less than this is actually stored since the first bit is always high. For instance, the delta 11 (binary 1011) has a required word width of four bits ,but only the least significant three bits are stored. A zero delta will have a zero word width and consequently requires neither delta bits nor sign bit. A delta of one will require only a sign bit. One special case requires attention. A normal two's complement number's lowest negative number is one less than the highest positive number. Treating zero as a positive value this gives exactly as many negative as positive numbers. The delta encoding on the other hand does not consider zero to be of any sign and does therefore not include the one extra negative value. If this value is encountered in the delta stream it is encoded as one greater than it actually is (putting it within the expressable range of values). To distinguish it from the next lowest value one extra bit is inserted after the sign bit. The bit is high for the lowest value and low for the next lowest value. For example, a 16-bit two's complement number can be -32768. It would be encoded as negative 32767 with an extra high bit. The value - 32767 would also be encoded as negative 32767 but with the extra bit low. Of course, only these two values require the extra bit. The WWM preceeds the delta bits. It is encoded as a series of low bits (0) terminated by a high bit (1) (in most cases). The count of low bits tells the modifier amount. If the modifier isn't zero an extra bit follows that tells the modifier sign. A high bit means negative modifier. Word width "wraps" at the used bit resolution (new-width =3D (original-width + modifier) modula bit- resolution). This enables us to go from a small width to a large width by using a negative modifier. Because of this fact a WWM will never need to be larger than the sound bit resolution divided by two (rounded downwards). If the modifier is the maximum the terminating high bit would be superfluous, so in this case it isn't inserted. (However; the sign bit is always included, even if the bit resolution is even.) For encoding the current word width and sample value should be initially reset to zero for each channel (the first delta will thus be the sample value). A compressed channel always starts on an even 16-bit word boundary. Notice that the highest possible compression ratio is eight times, i.e. one bit per sample. This occurs when the source is continous series of zero samples. 4.2 DWVW sample delta bit frame: 0... WWM is the count of low bits (can be none) 1 terminating high bit (if not max W=WM) ms WWM sign, high is negative (only on non-zero WWM) delta (word width - 1) sample delta bits (if delta 1) sb delta sign bit (only on non-zero delta) xb extra bit (only on lowest and next lowest possible delta value) Some encoding examples (the examples all represent extreme situations with unusually poor compression): Bit resolution 16 Delta 923 (bin 00000011 10011011=) Current width 1 New width 10 Modifier -7 (mod 16 =3D 10) Yields 0000000 1 1 110011011 0 Bit resolution 12 Delta -2048 (bin 1000 00000000) Current width 0 New width 11 Modifier -1 (mod 12 =3D 11) Yields 0 1 1 1111111111 1 1 (-2048 is encoded as 2047 with extra bit and negative high) Bit resolution 8 Delta -12 (bin 11110100, negated 00001100) Current width 0 New width 4 Modifier +4 Yields 0000 0 100 1 (no terminating bit for WWM) 5 D00 This part describes the D00 music format (used by the AdLib player v4.01 coded by JCH/Vibrants) in more detail than the docs of EdLib (the respective tracker, also coded by JCH) do. This document assumes that you already own EdLib and have some experience with it. Also, the availability of the EdLib docs as well as of the docs for the player included with EdLib is assumed. You should know some basics about AdLib programming and data formats (byte, word etc.) as well as the EdLib structures (Instruments, SpFX etc.) and with hexadecimal notation. 5.1 The D00 header A description of the D00 header can be found in the player's docs. So I won't show it again here. But JCH gives very cryptic names to the other file structures, so I'll call them differently: JCH's names My names TPoin tables Arrangment data SeqPointer tables Sequence data Instrument data Instrument data DataInfo text Song description Special tables SpFX data Also, I should mention that all the pointers to these tables are meant relative to the beginning of the D00 file. 5.2 The Instrument data The instrument data simply consists of all instruments used in the song. Since the number of instruments is stored nowhere inside the file, loaders should the start offset of the next structure for determining if they have read enough data. The data for each instrument consists of 16 bytes, which occur in the same order as the corresponding bytes in the EdLib Instrument table: xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx +------------+ +------------+ & & & & & & Carrier data Modulator data & & & & +---Unused & & & +Hard restart SR value & & +Hard restart timer & +Fine-tune +AM/FM + Feedback For the exact meaning of these bytes, read the EdLib manual. Note that in the Carrier and Modulator data the ADSR parts are not stored word-oriented, but byte-oriented. That means, they aren't stored as a word whose High byte is the AD part and whose Low byte is the SR part (although the display in EdLib creates that assumption). Instead they're simply stored as two bytes of which the first one's the AD part and the second one's the SR part. 5.3 The SpFX data The SpFX data ist stored more or less like the Instrument data, but one single table entry consists of only 8 bytes arranged like this: xxxx xx xx xx xx xxxx (note xx's are BYTES and xxxx's are WORDS!) & & & & & & & & & & & +Pointer to next SpFX entry & & & & +Duration of SpFX entry in Frames & & & +Modulator Level add & & +New Modulator level & +Note add value +Instrument to use Again, to really understand the meaning of these parts, you should read the EdLib docs. 5.4 The Arrangement data The arrangement data determines which sequence is to be played on which channel at which moment and in which way, if you understand what I mean :) It consists of two parts: The Pointer part and the Data part (I simply call them that way now :). The Pointer part consists of 16 word pointers and one endmark (all endmarks are FFFFh, by the way). Only the first nine pointers are used at the moment: one for each one of the nine AdLib channels. Each one of these nine pointers points to the part of the Data part which belongs to its channel. The Data part consists, as you'd have guessed before, of nine independent arrangement streams. Each one of tese streams has the following format: First comes a word telling the speed of that stream. Since this information is stored at the beginning of EVERY stream, I assume that every channel may have its own unique speed, and EdLib simply doesn't support this. After that, the real arrangement data is stored. This data is organized like this: If a word below 8000h is read, it's the number of a sequence to be played. In that case, the saved transpose data is used. But if a word 8XYYh is read, with X and YY being any value, the transpose data is updated to X and YY (see the EdLib docs for information on the meaning of X and YY). I have found out that the first arrangement entry for an arrangement stream that contains at least one sequence is always such a command to set the internal transpose data. So no default value is required to be loaded into the transpose data before playing. And looping the arrangement stream becomes easier. If the word FFFFh is read, the arrangement stream has arrived at its looping point. The word following the FFFFh is an offset into the arrangement stream telling at which position the stream should be restarted. If the word FFFEh is read, the arrangement stream has reached its end. Unlike the Loop command (FFFFh), the stream mustn't get restarted but halted. Also, there is no word following the FFFEh command. 5.5 The Sequence data The Sequence data again consists of a pointer part and a data part. But this time these two parts aren't stored in different parts of the file, the data part is stored directly after the pointer part. Therefore, a reference to a specific pattern should be seen as a reference to a word counted from the beginning of the Sequence data. This word (e.g. the first word for Pattern 0000h) then points to the offset of the actual sequence data inside the file. I hope you got my point... Then, each sequence is stored as follows: Read a word. If it's high byte is below 20h, then it's a note. Note that RESTs and HOLDs are also counted as notes. In this case, the low byte can contain the following values: 00h = REST The high byte tells the number of rests to insert minus one! e.g. a REST with a high byte of 01h means "Two RESTs" 01h - 7Dh = Note The value of this note byte tells the amount of halfnotes to add to C-0 (e.g. 01h would mean C#0). In this case, the high byte tells the number of HOLDs to insert after the note. 7Fh = HOLD The high byte tells the number of HOLDs minus one again! If the high byte is 20h or above, but below 40h, it's a note again, but this time with Tienote switched on. The high word is used as repetition count again, but don't forget to substract 20h before evaluating it!! If the high bzte is 40h or above, it's an effect. In this case, the complete word can simply be interpreted like any EdLib effect (set instrument, set volume etc.). See the EdLib docs for a list of them. The note word this effect refers to follows directly after the ceffect word. If the read word is FFFFh, it indicates the end of that sequence. In that case, the next sequence to be played should be determined and loaded and the first effect/note of it should be played. 6 MIDI SAMPLE DUMP STANDARD 6.1 INTRODUCTION The MIDI SDS was adopted in January 1986 by the MIDI Manufacturers Association and the Japanese MIDI Standards Committee. The SDS defines the standard method for transfer of sound sample data between MIDI-equipped devices. Sample dumps may be accomplished with either an 'open loop' or 'closed loop' system. The open loop method simply involves the straight dump of all sample data from its source to the destination, with no timeouts, packet acknowledgements, or any other form of handshaking, much as in the manner of a sysex bulk dump, usually intiated at the source. The closed loop method allows the use of handshaking messages between the dump source and destination, and usually places the dump process under the control of the slave, to allow it time to process the incoming data as necessary. As with any standard, it can not be assumed that a device adheres to it unless the accompanying documentation specifically indicates it. Even then, it is best to check its conformity with non-critical data. 6.2 SPEC: SAMPLE DUMP FORMATS DUMP HEADER: F0 7E cc 01 ss ss ee ff ff ff gg gg gg hh hh hh ii ii ii jj F7 cc channel number ss ss sample number (LSB first) ee sample format (number of significant bits; 8->28) ff ff ff sample period (1/sample rate) in nanoseconds (LSB first) gg gg gg sample length, in words hh hh hh sustain loop start point (word number) (LSB first) ii ii ii sustain loop end point (word number) (LSB first) jj loop type (00:forwards only; 01:alternating) DATA PACKET: F0 7E cc 02 kk <120 bytes> mm F7 cc channel number kk running packet count (00->7F) mm checksum (XOR of 7E, cc, 02, kk <120 bytes>) The total size of a data packet is 127 bytes. This is to avoid overflow of the MIDI input buffer of a device that may want to receive an entire packet before processing it. A data packet consists of its own header, a packet number, 120 bytes of data, a checksum, and an EOX. The packet number begins at 00 and increments with each new packet. It resets to 00 after it reaches 7F, and continues counting. The packet number is used by the receiver to distinguish between a new data packet, or a resend of a previous packet. The packet number is followed by 120 bytes of data, which form 60, 40, or 30 words (MSB first for multiword samples), depending on the length of a single data sample. Each data byte hold seven bits, with the msb in each byte set to 0, in order to conform to the requirements of MIDI data transmission. Information is left justified within the 7-bit bytes, and unused bits are filled with 0. Example: Assume a data point in the memory of a 16-bit sampler, with the value 87E5. In binary, that would be: 1000 0111 1110 0101 and would be encoded as the following MIDI data stream: 01000011 01111001 00100000 The checksum is the running XOR of all the data after the SYSEX byte, up to but not including the checksum itself. 6.3 SPEC: SAMPLE DUMP MESSAGES DUMP REQUEST: F0 7E cc 03 ss ss F7 cc channel number ss ss sample number requested (LSB first) Upon receiving the request, the sampler checks the sample number to see if it is within legal range. If it is not, the request is ignored. If it is, the sample dump is started. One packet at a time is sent, under control of the handshaking messages outlined below. 6.4 HANDSHAKING MESSAGES: For all below: cc channel number pp packet number Packet numbers are included in the handshaking messages to accomodate machines that have the intelligence to re-transmit specific packets after an entire dump is finished, or if synchronization is lost. ACK F0 7E cc 7F pp F7 Means last packet was recieved correctly (checksum OK, etc), please send next one. Packet number is packet being acknowledged as correct. NAK F0 7E cc 7E pp F7 Means last packet not received correctly, please send again. Packet number is packet being rejected. CANCEL F0 7E cc 7D pp F7 Means abort dump immediately. Packet number is packet on which abort occurs. WAIT F0 7E cc 7C pp F7 Means pause dump indefinitely, until next message is sent. Allows the unit recieving the dump to perform other functions (disk access, etc), before receiving the remainder of the dump. The next message it sends (eg ACK, ABORT) will determine if the dump continues or aborts. 6.5 DUMP PROCEDURE: MASTER (DUMP SOURCE) Once a dump has been requested, either via MIDI or through the front panel, the DUMP HEADER is sent. After sending the header, the master must time out for at least two seconds, to allow the receiver to decide if it will accept this sample (has enough memory, etc).If it receives a CANCEL, within this time, it should abort immediately. If it receives an CAK, it will start sending packets immediately. If it receives a WAIT, it pauses until another message is received, and then processes that mesage normally. If nothing is recieved within the timeout, an open loop is assumed, and the dump starts with the first packet. After sending each packet, the master should time out for at least 20 milliseconds and watch its MIDI In. If an ACK is received, it sends the next packet immediately. If it receives an NAK, and the packet number matches the number of the last packet sent, it resend that packet If the packet numbers don't match, and the device is incapable of sending packets out of order, the NAK will be ignored. If a WAIT is received, the master should watch its MIDI In port indefinitely for another ACK, NAK, or CANCEL message, which it should then process normally. If no messages are received within 20 milliseconds of the transmission of a packet, the master may assume an open loop configuration, and send the next packet. This process continues until there are less than 121 data bytes to send. The final packet will still consist of 120n bytes, regardless of how many significant bytes actually remain, and the unused bytes will be filled with zeroes. The receiver should handshake after receiving the last packet. 6.6 DUMP PROCEDURE: SLAVE (DUMP DESTINATION) When receiving a sample dump, a device should keep a running checksum during reception. If its checksum matches the checksum in the data packet, it will send an ACK and wait for the next packet. If it does not match, it will send an NAK containing the number of the packet that caused the error, and wait for the next packet. If, after sending an NAK, the packet number of the next packet doesn't match the previous packet number (the one that was NAK'd), and the unit is not capable of accepting packets out of order, the error is ignored and the dump continues as if the checksums had matched. If a receiver runs out of memory before the dumpo is completed, it should send a CANCEL to stop the dump. 6.7 SDS OVERVIEW DUMP DATA FORMAT: DUMP HEADER Sysex ID: Universal Non-Real Time Channel Number Sub ID: Header Sample Number (2 bytes, LSB first) Sample Format Sample Period (3 bytes, LSB first) Sample Length (3 bytes, LSB first) Sustain Loop Start Point (3 bytes, LSB first) Sustain Loop End Point (3 bytes, LSB first) Loop Type Eox SAMPLE DUMP DATA FORMAT: DATA PACKET Sysex ID: Universal Non-Real Time Channel Number Sub ID: Data Packet Packet Number Sample Data (120 bytes) Checksum Eox SAMPLE DUMP MESSAGES: DUMP REQUEST Sysex ID: Universal Non-Real Time Channel Number Sub ID: Dump Request Sample Number (2 bytes, LSB first) Eox SAMPLE DUMP MESSAGES: HANDSHAKING FLAGS: Sysex ID: Universal Non-Real Time Channel Number Sub ID: ACK or NAK or CANCEL or WAIT Packet Number Eox 7 ROL This part contains details of .ROL files used by AdLib and compatible cards on PC It is also used by Visual Composer (TM). 7.1 Structure of .ROL files: fld # size (bytes) type description 1 2 int file version, major 2 2 int file version, minor 3 40 char unused 4 2 int ticks per beat 5 2 int beats per measure 6 2 int editing scale (Y axis) 7 2 int editing scale (X axis) 8 1 char unused 9 1 char 0 = percussive mode 1 = melodic mode 10 90 char unused 11 38 char filler 12 15 char filler 13 4 float basic tempo Field 14 indicates the number of times to repeat fields 15 and 16: fld # size type description (bytes) 14 2 int number of tempo events 15 2 int time of events, in ticks 16 4 float tempo multiplier (0.01 - 10.0) The remaining fields (17 to 34) are to be repeated for each of 11 voices: fld # size type description (bytes) 17 15 char filler 18 2 int time (in ticks) of last note +1 Repeat the next two fields (19 and 20) while the summation of field 20 is less than the value of field 18: fld # size type description (bytes) 19 2 int note number: 0 => silence from 12 to 107 => normal note (you must subtract 60 to obtain the correct value for the sound driver) 20 2 int note duration, in ticks 21 15 char filler Field 22 indicates the number of times to repeat fields 23 to 26: fld # size type description (bytes) 22 2 int number of instrument events 23 2 int time of events, in ticks 24 9 char instrument name 25 1 char filler 26 2 int unused 27 15 char filler Field 28 indicates the number of times to repeat fields 29 and 30: fld # size type description (bytes) 28 2 int number of volume events 29 2 int time of events, in ticks 30 4 float volume multiplier (0.0 - 1.0) 31 15 char filler Field 32 indicates the number of times to repeat fields 33 and 34: fld # size type description (bytes) 32 2 int number of pitch events 33 2 int time of events, in ticks 34 4 float pitch variation (0.0 - 2.0, nominal is 1.0) 7.2 Notes Fields #1 and #2 should be set to 0 and 4 respectively. Field #10 should be filled with zeros. 8 8SVX The 8SVX files are IFF files used for digital audio data. The format of the VHDR block is complete guesswork. These files use Motorola byte order. The 8SVX file format is fixed to 8-bit mono sample data - at least GoldWave does not support saving files in any other format than 8-bit mono. 8.1 FORMblock [VHDR] This is the sample information block. The normal size is 20 bytes. OFFSET Count TYPE Description 0000h 1 dword Sampling rate of digital data in Hz. This count seems not to be too accurate, at least GoldWave v2.0 creates different rates for Wave and 8SVX files. 0004h 4 dword Other data, unknown 8.2 FORMblock [BODY] This block contains the raw sample data, maybe the usual IFF compression was used. The details of both the compression and the information about the IFF format are unknow. 9 AIFF The Audio Interchangeable File Format files are digital audio files stored in the IFF format; the samples are stored in signed PCM. The header block is [AIFF], different subblocks are : [AUTH] The authors information optional [COMM] This record stores information about the sampled data OFFSET Count TYPE Description 0000h 1 word number of channels or number of instrument samples ??? 0002h 1 dword Sample length 0006h 1 dword lower frequency 000Ah 1 dword maximum frequency 000Dh 1 dword ??? [MARK] [NAME] The name of the instrument / sample [SSND] The stored sample data. 10 AU The AU files are digital audio files used by the Sun and NeXT workstations. Further information wanted. OFFSET Count TYPE Description 0000h 4 char ID='.snd' 0004h 1 dword Offset of start of sample 0008h 1 dword Length of stored sample 000Ch 1 dword Sound encoding : 1 - 8-bit ISDN u-law, 2 - 8-bit linear PCM (REF-PCM), 3 - 16-bit linear PCM, 4 - 24-bit linear PCM, 5 - 32-bit linear PCM, 6 - 32-bit IEEE floating point, 7 - 64-bit IEEE floating point, 23 - 8-bit ISDN u-law compressed(G.721 ADPCM) 0010h 1 dword Sampling rate 0014h 1 dword Number of sample channels 11 FSM The .FSM files are samples to be used for module style music with the Fandarole Composer. Currently only samples of up to 64K length are supported, altough the header reserves a dword for the sample size. OFFSET Count TYPE Description 0000h 4 char ID='FSM',254 0004h 32 char ASCII name of sample 0024h 3 char ID=10,13,26 0027h 1 dword Length of sample (<=64K) 0028h 1 byte Fine tune value for sample (currently unsupported) 0029h 1 byte Sample volume (currently unsupported) 002Ah 1 dword Start of sample loop 002Dh 1 dword End of sample loop. If the sample is not set to loop (see below) this should be set to the end of the sample. 0032h 1 byte Sample type bitmapped 0 - 8-bit/16-bit sample 1-7 - reserved 0033h 1 byte Loop mode ?bit mapped? 0-2 - reserved 3 - loop off/loop on 4-7 - reserved 0034h ? byte Sample data in signed format 12 GF1 PATCH The GF1 Patch files are multipart sound files for the Gravis Ultrasound sound card to emulate MIDI sounds in high quality. Each Patch can consist of many samples (for example, a string ensemble consists of Violin, Viola, Cello, Bass) which are played depending on the note to play. A patch can also contain a part to be played before the loop and a part to be played after the tone has been released. OFFSET Count TYPE Description 0000h 12 char ID='GF1PATCH110' 000Ch 10 char Manufacturer ID 0018h 60 char Description of the contained Instruments or copyright of manufacturer. 0054h 1 byte Number of instruments in this patch 0055h 1 byte Number of voices for sample 0056h 1 byte Number of output channels (1=mono,2=stereo) 0057h 1 word Number of waveforms 0059h 1 word Master volume for all samples 005Bh 1 dword Size of the following data 0060h 36 byte reserved Following this header, the instruments with their headers follow. An instrument header contains the name and other data about one instrument contained within the patch. OFFSET Count TYPE Description 0000h 1 word Instrument number. ?Maybe the MIDI instrument number?. In the Gravis patches, this is 0, in other patches, I found random values. 0002h 16 char ASCII name of the instrument. 0012h 1 dword Size of the whole instrument in bytes. 0016h 1 byte Layers. Needed for whatever. 0017h 40 byte reserved About the patch, I don't know anything. Maybe somebody could enlighten me. Each patch record has the following format : OFFSET Count TYPE Description 0000h 7 char Wave file name 0007h 1 byte Fractions 0008h 1 dword Wave size. Size of the wave digital data 000Ch 1 dword Start of wave loop 0010h 1 dword End of wave loop 0012h 1 word Sample rate of the wave 0014h 1 word Minimum frequency to play the wave 0016h 1 word Maximum frequency to play the wave 0018h 1 dword Original sample rate of the wave data 001Ch 1 int Fine tune value for the wave 001Eh 1 byte Stereo balance, values unknown** 001Fh 6 byte Filter envelope rate 0025h 6 byte Filter envelope offse 002Bh 1 byte Tremolo sweep 002Ch 1 byte Tremolo rate 002Dh 1 byte Tremolo depth 002Fh 1 byte Vibrato sweep 0030h 1 byte Vibrato rate 0031h 1 byte Vibrato depth 0032h 1 byte Wave data, bitmapped 0 - 8/16 bit wave data 1 - signed/unsigned data 2 - de/enable looping 3 - no/has bidirectional looping 4 - loop forward/backward 5 - Turn envelope sustaining off/on 6 - Dis/Enable filter envelope 7 - reserved 0033h 1 int Frequency scale, whatever that means 0035h 1 word Frequency scale factor 0037h 36 byte Reserved 13 S3I This is the Digiplayer/ST3.0 digital sample file format. The sample files include information about the loop of the instrument. The AdLib instruments have another format listed below. OFFSET Count TYPE Description 0000h 1 byte ID=01h 0001h 12 char DOS filename 000Dh 1 byte reserved (0) 000Eh 1 word Paragraph offset of the raw sample data from beginning of file. 0010h 1 dword Sample length in bytes 0014h 1 dword Start of sample loop 0018h 1 dword End of sample loop 001Ch 1 byte Playback volumne of sample 001Dh 1 byte ??? "DSK" what ever that means 001Eh 1 byte Pack type 0 - unpacked 1 - DP30ADPCM 1 001Fh 1 byte Flags (bitmapped) 0 - loop on/off 1 - stereo sample (length bytes for left channel, then another length bytes for right channel!) 2 - 16-Bit samples (in Intel byte order) 0020h 1 dword C2 frequency 0024h 1 dword reserved 0028h 1 word reserved 002Ah 1 word ID=512 002Ch 1 dword ?? Date of last modification ?? (see table 0009) 0030h 28 char ASCIIZ Sample name 003Ch 4 char ID='SCRS' 0040h ? byte Raw sample data Here follows the AdLib instrument format for which I don't know the extension: OFFSET Count TYPE Description 0000h 1 byte Instrument type 2 - melodic instrument 3 - bass drum 4 - snare drum 5 - tom tom 6 - cymbal 7 - hihat 0001h 12 char DOS file name 000Dh 3 byte reserved 0010h 1 byte Modulator description (bitmapped) 0-3 - frequency multiplier 4 - scale envelope 5 - sustain 6 - pitch vibrato 7 - volume vibrato 0011h 1 byte Carrier description (same as modulator) 0012h 1 byte Modulator miscellaneous (bitmapped) 0-5 - 63-volume 6 - MSB of levelscale 7 - LSB of levelscale 0013h 1 byte Carrier description (same as modulator) 0014h 1 byte Modulator attack / decay byte (bitmapped) 0-3 - Decay 4-7 - Attack 0015h 1 byte Carrier description (same as modulator) 0016h 1 byte Modulator sustain / release byte (bitmapped) 0-3 - Release count 4-7 - 15-Sustain 0017h 1 byte Carrier description (same as modulator) 0018h 1 byte Modulator wave select 0019h 1 byte Carrier wave select 001Ah 1 byte Modulator feedback byte (bitmapped) 0 - additive synthesis on/off 1-7 - modulation feedback 001Bh 1 byte reserved 001Ch 1 byte Instrument playback volume 001Dh 1 byte ??? "DSK" 001Eh 1 word reserved 0020h 1 dword C2 frequency 0024h 12 byte reserved 0030h 28 char ASCIIZ Instrument name 004Ch 4 char ID='SCRI' 14 UWF The UWF files are sample files used by the UltraTracker. Further information wanted. OFFSET Count TYPE Description 0000h 32 char ASCIIZ sample name 0020h 1 char ID=1Ah 0021h 1 char ID=10h 0022h 5 char ID='MUWFB' 0027h 1 char ID=0 0028h 6 char Length of sample as ASCII long integer 002Eh 1 word Length of sample 15 WAVE The Windows .WAV files are RIFF format files. Some programs expect the fmt block right behind the RIFF header itself, so your programs should write out this block as the first block in the RIFF file. The subblocks for the wave files are: 15.1 RiffBLOCK [data] This block contains the raw sample data. The necessary information for playback is contained in the [fmt ] block. 15.2 RiffBLOCK [fmt ] This block contains the data necessary for playback of the sound files. Note the blank after fmt. OFFSET Count TYPE Description 0000h 1 word Format tag 1 = PCM (raw sample data) 2 etc. for APCDM, a-Law, u-Law ... 0002h 1 word Channels (1=mono,2=stereo,...) 0004h 1 dword Sampling rate 0008h 1 dword Average bytes per second (=sampling rate*channels) 000Ch 1 word Block alignment / reserved ?? 000Eh 1 word Bits per sample (8/12/16-bit samples) 15.3 RiffBLOCK [loop] This block is for looped samples. Very few programs support this block, but if your program changes the wave file, it should preserve any unknown blocks. OFFSET Count TYPE Description 0000h 1 dword Start of sample loop 0004h 1 dword End of sample loop 16 ZyXEL The ZyXEL Modems are capable of digitizing speech, the ZFAX software and answering machine software like VoiceConnect store the sampled data in those files. The Modems are capable of compressing the data down to 19.2k CPS (ADPCM) and 9.6k CPS (CELP), the algorithms for the compression may be found in the ZyxelVoc package by N. Igl, but as the firmware on the modems changes, so might the compression algorithm. Playback on the modem is always possible. Files are specified by the .ZVD and .ZYX extensions. OFFSET Count TYPE Description 0000h 5 char ID='ZyXEL' 0005h 1 byte 02h, ??? format tag 0006h 4 byte reserved 000Ah 1 word Compression scheme 0 - CELP 1 - 2 bit ADPCM 2 - 3 bit ADPCM 000Ch 4 byte reserved 0010h ? ???? Raw Data, The voice data is just the data received from U1496 Modem/Fax. 17 Creative Labs File Formats 17.1 Sound Blaster Instrument File Format (SBI) The SBI format contains the register values for the FM chip to synthesize an instrument. Offset Description 00h-03h Contains id characters "SBI" followed by byte 1Ah 04h-23h Instrument name, NULL terminated string 24h Modulator Sound Characteristic (Mult, KSR, EG, VIB, AM) 25h Carrier Sound Characteristic 26h Modulator Scaling/Output Level 27h Carrier Scaling/Output Level 28h Modulator Attack/Delay 29h Carrier Attack/Delay 2Ah Modulator Sustain/Release 2Bh Carrier Sustain/Release 2Ch Modulator Wave Seelct 2Dh Carrier Wave Select 2Eh Feedback/Connection 2Fh-33h Reserved 17.2 Creative Music File Format (CMF) The CMF file format consists of 3 blocks: the header block, the instrument block and the music block. The CMF Header Block Offset Description 00h-03h Contains id characters "CTMF" 04h-05h CMF Format Version MSB = major version, lsb = minor version 06h-07h File offset of the instrument block 08h-09h File offset of the music block 0Ah-0Bh Clock ticks per quarter note (one beat) default = 120 0Ch-0Dh Clock ticks per second 0Eh-0Fh File offset of the music title (0 = none) 10h-11h File offset of the composer name (0 = none) 12h-13h File offset of the remarks (0 = none) 14h-23h Channel-In-Use Table 24h-25h Number of instruments used 26h-27h Basic Tempo 28h-? Title, composer and remarks stored here 17.3 The CMF Instrument Block The instrument block contains one 16 byte data structure for each instrument in the piece. Each record is of the same format as bytes 24h-33h in the SBI file format. 17.4 The CMF Music Block The music block adheres to the standard MIDI file format, and can have from 1 to 16 instruments. The PC-GPE file MIDI.TXT contains more information on this file format. The music block consists of an alternating seqence of time and MIDI event records: dTime MIDI Event dTime MIDI Event dTime MIDI Event ........ dTime (delta Time) is the amount of time before the following MIDI event. MIDI Event is any MIDI channel message. The CMF file format defines the following MIDI Control Change events: Control No Control Data 66h 1-127, used as markers in the music 67h 0 - melody mode, 1 = rhythm mode 68h 0-127, changes the pitch of all following notes upward by the given number of 1/128 semitones 69h 0-127, changes the pitch of all following notes downward by the given number of 1/128 semitones In rhythm mode, the last five channels are allocated for the percussion instruments: Channel Instrument 12h Bass Drum 13h Snare Drum 14h Tom-Tom 15h Top Cymbal 16h High-hat Cymbal 17.5 Sound Blaster Instrument Bank File Format (IBK) A bank file is a group of up to 128 instruments. Offset Description 00h-03h Contains id characters "IBK" followed by byte 1Ah 04h-803h Parameters for 128 instruments, 16 bytes for each instrument in the same format as bytes 24h-33h in the SBI format 804h-C83h Instrument names for 128 instruments, 9 bytes for each instrument, each name must be null terminated 18 Creative Voice (VOC) file format HEADER (bytes 00-19) Series of DATA BLOCKS (bytes 1A+) [Must end w/ Terminator Block] byte # Description 00-12 "Creative Voice File" 13 1A (eof to abort printing of file) 14-15 Offset of first datablock in .voc file (std 1A 00 in Intel Notation) 16-17 Version number (minor,major) (VOC-HDR puts 0A 01) 18-19 2's Comp of Ver. # + 1234h (VOC-HDR puts 29 11) Data Block: TYPE(1-byte), SIZE(3-bytes), INFO(0+ bytes) NOTE: Terminator Block is an exception -- it has only the TYPE byte. TYPE Description Size (3-byte int) Info 00 Terminator (NONE) (NONE) 01 Sound data 2+length of data * 02 Sound continue length of data Voice Data 03 Silence 3 ** 04 Marker 2 Marker# (2 bytes) 05 ASCII length of string null terminated string 06 Repeat 2 Count# (2 bytes) 07 End repeat 0 (NONE) 08 Extended 4 *** *Sound Info Format: **Silence Info Format: 00 Sample Rate 00-01 Length of silence - 1 01 Compression Type 02 Sample Rate 02+ Voice Data ***Extended Info Format: 00-01 Time Constant: Mono: 65536 - (256000000/sample_rate) Stereo: 65536 - (25600000/(2*sample_rate)) 02 Pack 03 Mode: 0 = mono 1 = stereo Marker# Driver keeps the most recent marker in a status byte Count# Number of repetitions + 1 Count# may be 1 to FFFE for 0 - FFFD repetitions or FFFF for endless repetitions Sample Rate SR byte = 256-(1000000/sample_rate) Length of silence in units of sampling cycle Compression Type of voice data 8-bits= 0 4-bits = 1 2.6-bits = 2 2-bits = 3 Multi DAC = 3+(# of channels) [interesting this isn't in the developer's manual] 19 Revision History Version 1.0 - First document containing 15 formats Version 1.1 - 2 More formats added