In article , Bryan Parkoff wrote: > I have read Darek's website at http://www.emulators.com/. Darek is an >emulated programmer and he told me that he claims that C/C++ language is the >WRONG languge for Emulator Project. He says that he focuses at assembly >language for best optimization and performance. It is almost impossible for >me to tell since he has been programming for over 17 years. He always >disagrees with many programmers for general Emulation. I'm not sure where to start with this, so let's jump to (I'm the author of KEGS): > I can tell that KEGS32 Project has a huge source code and have a very >long routines. It is not good for writing emulated practices. I'm not sure what you mean here. KEGS is about twice as fast as any other non-assembly emulator, so it's doing something right. Note that KEGS's main interpreter loop is a little convoluted since the code includes PA-RISC assembly right along with the C code. On PA-RISC machines of 5 years ago, the assembly version was about twice the speed of the C. But I knew the instruction timings and gotchas of PA-RISC down to an insane level and KEGS was optimized to take all of that into account. In fact, KEGS uses floating point to track time since FP operations bundled better with any other type of instruction than integer ones, so a little FP is basically "free." The length of a subroutine means nothing in terms of performance. An emulator spends 99%+ of it's time doing a loop of: 1. Some overhead (e.g., "are we done yet?") 2. Decode and dispatch an instruction 3. Do the instruction 4. Jump back to step 1. And of this time, about 30-70% of the time is spent in steps 1 and 2. This is why straight interpreters take an immediate 2-3X performance hit. But straight interpreters are the only kind that are fully portable, so that's what KEGS is. KEGS is designed so that a C compiler should minimize the work done in steps 1, 2, and 4. KEGS main loop is (basically, the real thing's always more complex): int enter_engine() { int pc, a, x, y, sp, dp, k; int opcode; float fcycles; while(1) { if(fcycles < 0) { break; } opcode = get_byte(pc); pc++; fcycles -= f_twoclks; switch(opcode) { case 0x00: /* BRK */ etc. case 0xff: /* SBC Long,X */ etc. } } } Since I wrote the PA-RISC version in assembly, I knew exactly which variables should go into registers on a RISC machine. On RISC machines, accessing global variables are slower than accessing registers. So KEGS' main loop basically tries to keep everything in local variables so the compiler will make them native registers. If you put things in a struct or in global variables, then you have to go through memory at a pretty big penalty. To keep all this CPU state in hardware registers requires everything to be one big routine--all the instructions have to see the same local variables. So the routine is big. But it should be faster than any other style since the native jump back to the while(1) will be faster than a subroutine call (in general...). This adds complexity to KEGS's handling of instructions since subroutines called from enter_engine() cannot see any 65816 hardware state (like A,X,Y, etc). But it clearly works. KEGS was tuned in size so that the assembly version on PA-RISC fits easily into a 32K instruction cache in its entirety. So it's not outrageously large. This is why KEGS has a separate dispatch loop for 8-bit ACC and 16-bit ACC, but not for the X and Y sizes or for emulation mode. All the special code for X and Y sizes and for emulation mode are just done all the time. KEGS is also pretty careful about how it handles the PSR bits--the most-frequently changing ones (Z and N) are kept in separate variables so the native machine can handle those updates very fast. But any time the PSR is accessed as a byte, the pieces need to be assembled and disassembled. Other complexities about the Apple II memory map cause KEGS to treat the entire memory range with an indirection level that's just like virtual memory. Plus there's memory shadowing, where writes to one region need to be reflected into another. This adds more complexity. KEGS isn't optimal in how it does all of this (some memory-copying techniques are faster for some usage patterns, or some lazy updates), but KEGS has very good worst-case performance. > If you want to avoid using classes, I recommend to write global >variables that are much easier to read that looks like classes. Lets says, Global variables have similar performance to local variables on an x86 platform (since both will be in memory in general), but they will be definitely slower on any RISC computer. If you're going to just go for x86, then go ahead and write the main loop in assembly. A good look at KEGS should give you an idea of what should go in assembly and what should stay in C. I'm not planning to write any more assembly for KEGS since it's already too fast--tracking 16K cycles in a float gets into some serious precision issues above an emulated 200MHz, so KEGS actually has a speed cap built-in. > All 256 6502's OpCode routines stay in CPU_Run() by using goto statement >rather than calling functions. Three local variables are inside classes so >it won't show global variables. Computed goto's are a gcc-ism that is non-portable. A switch() statement is portable and achieves pretty much the same effect. Not calling functions is a good idea. > If you choose not to use classes, there is an alternative. > >DWORD CPU_Accumulator = 0x00; >DWORD CPU_XIndex = 0x00; >DWORD CPU_YIndex = 0x00; > >void CPU_Run(); >void CPU_Initialize(); >void CPU_Terminate(); > > It is global variable that begins with CPU_ instead of g_ for easy >reading. All functions are global functions. It will reduce minor bugs. >Use BYTE and WORD keywords are bad practices for Emulator because C/C++ >compiler always adds "AND" instruction to clear 32-Bits, 24-Bits, and >16_Bits. DWORD is chosen so "AND" instruction is removed. It can reduce >x86's cycles and improve performance. Avoiding sub-native words sized operations is a good idea. But don't forget to mask X and Y to the correct precision before using them, especially if you're doing 65816 (which is a lot more complex than 6502). > I would like to discuss with some Apple II Emulator programmers via >e-mails if it is possible. It is not good to post Emulator Project >Programming Practices on newsgroups because nobody is interested in writing >Emulator Project until they are interested to play released Apple II >Emulator software. > Please advise. > > >-- >Bryan Parkoff Well, you've started a discussion here. Kent Dickey