I rather liked it back in the day. Don't judge. My wife — who used Commodores as a girl in the Australian school system but not this particular title — enjoyed it even more than I thought she would, enough so that she occupied VICE on the Talos II playing it all afternoon and prevented me from writing this.
The Grammar Examiner plays well enough on my real Commodore 128DCR, though it's a very slow loader, and I only have an original disk which I'd like to preserve. (My original original copy disappeared a while back, though I've had this particular one at least a couple decades.) A quick sector-by-sector D64 image using a ZoomFloppy yielded a number of apparently intentionally bad sectors typical of early 1980s copy protection, but even with the error information the program's loader just plain hung up in VICE trying to boot the copy. Yes, a nibbled raw copy of the GCR would work and I imagine people have made one of this title, but we'd also like to speed up the process instead of burdening the emulator further (and it would be nicer on the real system too).
So in this post we'll explore the loader routine, decrypt and extract it, figure out how the copy protection is implemented and work around it, and then pull out the payload it reads for a faster start. While we're at it, let's look briefly at the program itself, an interesting example of Forth programming "in the large" on 1980's home computers.
Back in the day, educational software was generally neglected by piracy. We got our warez through two distinct sources, namely our teenaged babysitter whom we got in trouble when we convinced him to let us stay up late (ending both his employment and our connection), and the son of a friend of the family who also had an elder sister I had a fairly hopeless crush on. There's just something about permed hair and braces. Anyway, almost all of those were games.
As a result, educational titles didn't have the evolutionary pressure games did to contain clandestine copies; they only had to stop the casual home copier who wouldn't have the latest tools like Fast Hack'Em or Copy II Plus or some such. They also sold by and large to generally law-abiding consumers (i.e., schools, parents), and not in numbers sufficient to justify licensing one of the higher-end copy protection schemes — I never met an education program on the Commodore 64 that used Rapidlok or V-MAX!, for example. That may also have been why or because educational software makers were some of the earliest to leave the home computer market for PCs and compatibles, whereas games continued to flourish on the C64 due to its massive installed base as late as the early 1990s.
None of that means there weren't cracks, of course, but there were certainly fewer. Kids like me whose parental-sponsored software libraries (unwillingly) skewed more towards edutainment had to do our own workarounds, most of which by now are hopefully past the statute of limitations.
That brings us to today's victim.
DesignWare was founded by Jim Schuyler in 1980 on a $3000 credit card, a PhD in computer science from Northwestern and an Apple II personally given to him off the assembly line by Steve Jobs (who was admonished by another exec to never do that again). Schuyler's background was in computer-aided instruction and developed a language called LINGO, not to be confused with Macromedia Director's scripting language, to model the interaction between a user and the machine. Around the same time was another larger system innovating in that very same space using similar ideas, which was of course the TUTOR language underlying UIUC CERL's PLATO. Schulyer reworked LINGO into MULTITUTOR, which could run TUTOR lessons on their own local CDC 6400 mainframe but wasn't limited to PLATO terminals. While at the University of the Pacific in San Francisco, MULTITUTOR's ideas and architecture in turn came to underlie Schuyler's CDS in 1977, the CAI Design System (later the Courseware Design System), with its own bespoke TUTOR-like language that could run converted TUTOR, PILOT and other CAI systems' lessons.Early microcomputers were starting to appear, notably the Apple II and the Atari 8-bit family, and after meeting with their leadership Schuyler felt these more inexpensive home computers had greater potential for democratizing education than the client-server models he first worked on. Initially DesignWare did contract and commissioned work for other companies, most notably Reader's Digest Software (Trickster Coyote comes to mind) and later in 1982 Spinnaker Software, where their first big hits FaceMaker and Story Machine were published. In 1983 they started selling educational and edutainment software under their own name, one of the few submarkets in the industry back then not severely impacted by the video game crash. These first-generation first-party titles included notables such as Crypto Cube and especially Spellicopter, which became one of their biggest sellers. The above ad was from Creative Computing in October 1983.
By 1984 DesignWare was developing for the Apple II family, the Atari 8-bit family, the IBM PC, the IBM PCjr and the Commodore 64, meeting Schuyler's goal of educational opportunities on all kinds of home systems. In fact, this catalogue from October 1984 shows the Commodore 64 eventually got ports of all their first-party software; the only computers that didn't run the full line were the IBM PCjr that DesignWare didn't port one title to, and ironically the Atari 8-bit family, which despite being one of their launch platforms eventually became too small a market to maintain development for. Unfortunately cash flow problems caused DesignWare to be acquired twice, first in 1984 by Management Software Associates along with other purchases such as accounting giant Peachtree, then spun off 38 days later to absorb and revive some of MSA's other failing holdings like the hapless EduWare until DesignWare was sold off to Britannica Software in 1986.In this guise DesignWare continued producing and publishing titles like Designasaurus under the DesignWare name as a sublabel of Britannica, expanding to the Amiga, until around 1989 when the brand was dissolved; the subsequent Spellicopter II sequel was developed and published under Britannica in 1991. After a name change to Compton's, Britannica eventually merged with SoftKey in 1995 and through SoftKey's later disastrous history the DesignWare IP now rests as part of Houghton Mifflin Harcourt Learning Technology — notably Harcourt was one of DesignWare's original contract publishers — along with other forgotten properties like Spinnaker and Springboard Software's back catalogues.
Early DesignWare titles came in typical boxes. We had Trap-A-Zoid from that era, but I've never enjoyed geometry, and I can't say I'm completely sad that one got lost. Later on DesignWare switched to these slicker spiral-bound books where the diskette was in a Tyvek pouch in the back. This wasn't my original copy of The Grammar Examiner, which I think got lost in the garage somewhere, but rather a replacement that I've still had for a good couple decades or more. DesignWare's 1984 MSRP for this title was $44.95 [2023 around $132]; this package was marked down to $29.95, though I don't remember exactly when that was. DesignWare programs were also distinguished by a common visual style: lots of high contrast black-and-white (the lowest common denominator for their supported systems), the ever-present DesignWare logo, and sort of a scratchy hand-drawn look to the graphics which actually was somewhat endearing. Less endearing was the loader. I never played these titles on an Apple II or Atari, but the Commodore 64 version took ages to even get started (the manual apologetically says it "is a large, complex program that will take a few minutes to load" — about two minutes and 15 seconds on a stop watch to the title screen) and my trusty Epyx FastLoad didn't make any difference. As it turns out, there are two stages to the loader and unfortunately not only does the program's on-disk layout make typical fastloading impossible, but the initial primary loader is also slower than it should be due to a technical oversight. More to come on that.The game would frequently pause to load more data from the disk. When it did, a little coloured floppy disk icon would flash in the lower corner, sounding a little two-tone chirp ("ke-LOP!") each time. This sound has deeply ingrained itself into my consciousness.
Here's a brief tour of The Grammar Examiner before we break it apart.
The main menu. DesignWare was very forward-thinking, including a lot of content on one disk side and allowing you to create your own lessons. I never did that as a kid but I always thought it was cool. For kids like me who were socially apathetic and/or inept, you could play with Mel, who had an adjustable IQ (though what this really did was just set the percent chance he had of randomly picking the right answer — even at an "IQ" of 150 he would still make some amazingly bad gaffes). Sadly, I don't believe DesignWare ever issued any game disks of their own. If they had, we might have bought them. Disk swapping would have probably made it more maddening than it should have been but at least the program supported multiple disk drives. Four game boards were available on disk (plus your own) but I most enjoyed the default N. Y. Times layout. Every board had seven types of squares, six of them being start (does nothing), chance, a quick grammar question, edit and revise a paragraph, randomly jump somewhere on the board, and return to start. You rolled a simulated die with RETURN and then moved your piece on the board (I'm the camera, Mel is the ball behind me). While you could turn loops as part of your moves, you could not immediately go back the way you travelled.
The chance, quick question and editing squares either added to your accumulated pay or (if you were unlucky or incompetent) docked it. All players start with $20,000, which seemed like a nice nest egg for a cub reporter in 1984 (in 2023 about $58,500). If you clear $23,000, then the seventh type of game square will appear at the top right, allowing you to bribe the publisher to let you become Editor-in-Chief if you have the highest sum when someone gets there. You then win the game until the early 2020s when local news reporting suffers a major financial downturn. The random squares were particularly important for getting to the finish line because of the ever-present threat of the return-to-start squares and Mel had a distinct programmed tropism for them.
The editing screen did not let you insert words, and the idea was not to entirely rewrite the article. Instead, you corrected grammar, spelling and punctuation by retyping individual words in place, keeping the overall format, tone and sentential structure the same. My wife and I quibbled over this in a couple places since style can be sometimes a subjective choice, but the game, and probably appropriately for an age range where ambiguity can be pedagogically unhelpful, has but one single view of truth and tolerates no variations. The manual says the style format came from the Chicago Manual of Style and Warriner's English Grammar and Composition, choosing Warriner's where they differed due to its focus on high school instead of college. Hint if you're unable to decide: some of the sentences from the paragraphs appear in the manual as examples.
We did find the in-universe explanation for the pop grammar quiz questions pretty suspect, but hey: if you already know the answer, and you're just willing to pay $400 to see if we do, that seems pretty fair to us.I enjoyed playing it as a kid, and I still do as an adult, but even as a larval dweeb I was still more fascinated by how it worked under the hood. I certainly picked apart my share of loaders and educational software back then; I've kept one particular title in my collection as a fond memory not because it's particularly good (it's a so-so reading comprehension program) but because I ripped off its graphics library to use in my own stuff. The DesignWare titles, on the other hand, were a bit more opaque.
The disk directory shows two files, DWARF and BOOT2, which (with 664 blocks free, the amount on a freshly formatted 1541 floppy disk) seem to take up absolutely no space themselves. Even the filenames were bizarre and baffling. DWARF is just a BASIC program that loads and executes BOOT2 with a SYS 49152. It then sits there and loads for a long period before the screen blanks and the cheerful chirp starts regularly playing.Now, we did have a backup copy of this back in the day and any nibbler should be able to copy it (we used Fast Hack'Em), so copy protection wasn't really the issue for 9-year-old me. I wanted to understand what it was doing and how it loaded from a disk that appeared to be completely empty. The only clue was a footnote in the manual:
MicroMotion FORTH-79 was an implementation of (wait for it) FORTH-79 initially for the Apple II. 9-year-old me had a very rudimentary awareness of Forth from the issue of CTW ENTER it appeared in, so I knew it was another programming language, but back then I didn't otherwise get the significance. Forth doesn't just provide an operating environment for programs; it is the operating environment, generally dealing in fixed-size numbered screens and blocks instead of variable-length named files.As such, it performed direct disk access strictly by track and sector instead of using the Commodore Kernal's load routine, something most fast loaders wouldn't accelerate, or indeed using any filesystem at all. In fact, the only "files" on the disk are the BASIC bootstrap and loader, and these appear to take "no space" by actually occupying sectors normally reserved for the disk directory on track 18. After all, there are only two files, so most of the directory track isn't needed to store the directory. (There are actually three files, but one is a deleted file called MONITOR$8000 which has only one valid sector and the rest was apparently lost. The name and the ghost file's remnant first sector suggest a simple machine language monitor that was used as a debugger. Another orphaned sector contains a different BASIC loader that loads both along with old file entries for BOOT ALL, MACROS and FORTH.BRK, but none of this is connected to anything.) This allowed almost the entirety of the disk to be allocated to Forth code and data.
MicroMotion's mention in the manual was in fact a contractual requirement. Forth was selected as DesignWare's development language because it was probably the closest thing back then to a common language feature set available for the disparate home systems of the day. Since Forth has a standard memory layout for its vocabulary and object code, much of it could be reused directly on other 6502-based systems, adding custom words for platform-specific features like high-resolution graphics, sound, and the track-and-sector routine; on non-6502-based platforms, the Forth screens could simply be recompiled.
MicroMotion's implementation of blocks was also useful for paging because it kept track of what was already loaded. When the program requested a certain block range be brought into memory, the Forth runtime would only load what was needed, reducing disk access. This paging process is when the disk icon would light up with the cheerful chirp. You could even mark blocks as to be updated on disk, almost like a manually-driven virtual memory system, though I don't think any of the DesignWare titles used that feature and this disk is write-protected (no write notch).
Let's figure out how the loader works by ripping the disk to a D64 image. d64copy shows multiple errors on track 31. Notice that Commodore GCR disks have variable sector density between tracks, with the larger-circumference outer tracks holding more. On a standard 1541 formatted floppy there are 35 tracks in total yielding 683 total blocks of 256 bytes each. Although tracks up to 40 are possible and used by some custom formats, this disk doesn't seem to use them.
1: ********************* 2: ********************* 3: ********************* 4: ********************* 5: ********************* 6: ********************* 7: ********************* 8: ********************* 9: ********************* 10: ********************* 11: ********************* 12: ********************* 13: ********************* 14: ********************* 15: ********************* 16: ********************* 17: ********************* 18: ******************* 19: ******************* 20: ******************* 21: ******************* 22: ******************* 23: ******************* 24: ******************* 25: ****************** 26: ****************** 27: ****************** 28: ****************** 29: ****************** 30: ****************** 31: --*-----*----*--- 87% 601/683[Warning] read error: 1f/0e: 5 23,read error,00,00 (bad checksum) 31: --***---**---*?*- 88% 606/683[Warning] read error: 1f/0a: 5 23,read error,00,00 (bad checksum) 31: **********?***?*- 89% 614/683[Warning] read error: 1f/10: 3 20,read error,00,00 (no header) 31: **********?***?*? 90% 615/683[Warning] giving up... 31: **********?***?*? 32: ***************** 33: ***************** 34: ***************** 35: ***************** 100% 683/683 680 blocks copied.
Three of the sectors err out. I'll provide a spoiler now: the errors d64copy thinks it duplicated aren't actually what's going on with that track. Hang tight.
We'll now start disassembling BOOT2, the machine-language portion of the bootstrap and the first-stage loader, with the VICE monitor. It's not actually zero blocks, of course, but it's small, loading from $c000 to $c14f. Execution starts at $c000.
.C:c000 20 28 C0 JSR $C028 .C:c003 20 41 C0 JSR $C041 .C:c006 20 59 C0 JSR $C059 .C:c009 A9 05 LDA #$05 .C:c00b 20 C3 FF JSR $FFC3 .C:c00e A9 0F LDA #$0F .C:c010 20 C3 FF JSR $FFC3 .C:c013 4C 00 08 JMP $0800
In broad strokes, this seems simple enough: it calls a couple subroutines, then closes channels 5 and 15 (implying those subroutines it calls opened them for reading and commands), and jumps to what we presume is the main program at $0800. (If you look at the MicroMotion memory map, the Forth dictionary starts at that location.) Let's start with the first call to $c028.
.C:c028 A0 0E LDY #$0E .C:c02a B9 29 C0 LDA $C029,Y .C:c02d 59 28 C0 EOR $C028,Y .C:c030 99 29 C0 STA $C029,Y .C:c033 C8 INY .C:c034 C0 B7 CPY #$B7 .C:c036 D0 22 BNE $C05A .C:c038 D2 JAM
This code doesn't make any sense, which is immediately suspicious: the loop isn't closed and is terminated with one of the NMOS 6502 halt-and-catch-fire instructions. That's because this is the first of two places in which it decrypts itself by exclusive-ORing bytes, here with the immediately preceding byte beginning with location $c037 (the offset for the BNE instruction in the loop). In this case it turns an apparent immediate forward branch into nonsense code to an actual loop back to $c02a ($22 EORed with $d0 is $f2) to continue the decryption. We'll drop a breakpoint in VICE to hit immediately after the loop to see what code is revealed.
(C:$c053) break c038 BREAK: 2 C:$c038 (Stop on exec) (C:$c053) x #2 (Stop on exec c038) 136/$088, 37/$25 .C:c038 20 E7 FF JSR $FFE7 - A:88 X:00 Y:B7 SP:f4 ..-...ZC 11154817
The jam "HCF" instruction at $c038 is now a call ($f2 EORed with $d2 is $20) to Kernal CLALL to close all open files. Let's continue disassembly with the first section of code decrypted. I've annotated the calls if you don't have the Kernal calls in your head like I do after all these years.
; close all .C:c038 20 E7 FF JSR $FFE7 ; open 15,dev,15,"" .C:c03b A9 00 LDA #$00 .C:c03d 20 BD FF JSR $FFBD .C:c040 A9 0F LDA #$0F .C:c042 A6 BA LDX $BA .C:c044 A8 TAY .C:c045 20 BA FF JSR $FFBA .C:c048 20 C0 FF JSR $FFC0 ; open 3,dev,3,"#" .C:c04b A9 23 LDA #$23 .C:c04d 85 FB STA $FB .C:c04f A9 01 LDA #$01 .C:c051 A2 FB LDX #$FB .C:c053 A0 00 LDY #$00 .C:c055 20 BD FF JSR $FFBD .C:c058 A9 03 LDA #$03 .C:c05a A6 BA LDX $BA .C:c05c A8 TAY .C:c05d 20 BA FF JSR $FFBA .C:c060 20 C0 FF JSR $FFC0 ; copy routine at $c07a to $9000 .C:c063 A9 00 LDA #$00 .C:c065 85 FB STA $FB .C:c067 A9 90 LDA #$90 .C:c069 85 FC STA $FC .C:c06b A0 66 LDY #$66 .C:c06d B9 7A C0 LDA $C07A,Y .C:c070 91 FB STA ($FB),Y .C:c072 88 DEY .C:c073 C0 FF CPY #$FF .C:c075 D0 F6 BNE $C06D ; jmp $9000 .C:c077 6C FB 00 JMP ($00FB)
The new code opens the command channel to the disk drive (as channel 15) and then a buffer for loading direct sectors from disk (as channel 3). It then copies a routine from $c07a to $9000 and jumps to it. We drop another breakpoint in VICE and pick up from $9000, where it immediately calls a subroutine at $903f.
; $9000 (copied from $c07a) .C:9000 20 3F 90 JSR $903F ; $903f ; clear channels .C:903f 20 CC FF JSR $FFCC ; send disk drive command: read sector 31 03 .C:9042 A2 0F LDX #$0F .C:9044 20 C9 FF JSR $FFC9 .C:9047 A0 00 LDY #$00 .C:9049 B9 57 90 LDA $9057,Y .C:904c F0 06 BEQ $9054 .C:904e 20 D2 FF JSR $FFD2 .C:9051 C8 INY .C:9052 D0 F5 BNE $9049 .C:9054 4C CC FF JMP $FFCC (C:$9068) m 9057 >C:9057 55 31 3a 33 20 30 20 33 31 20 30 33 0d 00 88 98 U1:3 0 31 03....
This sets up a block read command to channel 3 from track 31, sector 3. Note that this wasn't one of the sectors d64copy complained was bad. It returns to $9003 with the terminal call to $ffcc and then does this strange thing:
; select channel 3 for input .C:9003 A2 03 LDX #$03 .C:9005 20 C6 FF JSR $FFC6 .C:9008 A0 00 LDY #$00 .C:900a 84 FB STY $FB ; get a byte from the disk drive over the serial bus .C:900c 20 A5 FF JSR $FFA5 .C:900f 99 28 C0 STA $C028,Y .C:9012 99 28 C0 STA $C028,Y .C:9015 45 FB EOR $FB .C:9017 85 FB STA $FB .C:9019 C8 INY ; do it 256 times .C:901a D0 F0 BNE $900C ; ??? .C:901c A5 FB LDA $FB .C:901e C9 2E CMP #$2E .C:9020 F0 07 BEQ $9029 .C:9022 A9 59 LDA #$59 .C:9024 8D 0F 90 STA $900F .C:9027 D0 D7 BNE $9000 ; branch always, since #$59 is non-zero
This is the second decryption routine; we can see an exclusive OR occuring, but not obviously to the memory it's storing in. There are also two stores (STA) to the exact same location. Why is that?
Again, this is self-modifying code. If we hand-execute through this routine, after loading a full 256 bytes from that sector, it uses the exclusive-ORed value in $fb as a kind of check. If the value doesn't match, it changes the first STA to EOR and runs the entire routine from the beginning, asking for that sector again.
This would seem even crazier: now it's apparently exclusive-ORing the same bytes it got before with the same bytes it's getting now. Indeed, when I let it continue execution at this point in VICE, it will read that sector over and over and over and over in an infinite loop until I stopped it. It clearly works fine with the original disk, so what's in that sector?
The first time I pulled up the sector in the Epyx FastLoad disk editor, I got this.
It's a little odd. There was nothing in the high nybble at all. In terms of machine code, these bytes would be gibberish and couldn't possibly be executed directly. More to the point, I couldn't see how exclusive-ORing it would generate anything sensible since the high nybble would never get populated. I puzzled over that and went to bed.In the morning I pulled it up again to ponder further. It had changed. On an original, write-protected disk. This wasn't the sector I looked at last night:
I reloaded it multiple times. The sector has a split personality: it randomly flip-flops between one or the other. In the other morph, you can see there's nothing but a high nybble, and now XORing those values would make more sense. In fact, if you hand-merge the two using the disassembled loader, you get valid machine code.This is a critical portion of the routine. If you look way back at the beginning, after this first call we've been deobfuscating there's a second one to $c041 and then to $c059. Those calls hit gobbledygook by default, but the composite sector we load here overwrites that region. The loader thus loads a new piece of itself as its first task.
How do we get two views of one sector? We're only seeing part of the picture here by just looking at the data. The sector appears to be a normal CBM DOS sector, which has a header portion followed by the 256 data bytes of the sector we saw in the FastLoad editor, so it must be something about the header for that sector that's responsible. Let's get Maverick out (I own a physical copy of v5 on disk), the last word in Commodore disk copiers and copy-protection analysis tools, and look at the headers for track 31.
This is the Maverick GCR editor. GCR stands for Group Code Recording, the particular encoding method (at least Commodore's form of it) used by native Commodore floppy drives, and is the raw on-disk format. Using this tool we can see how the track is physically encoded (if you don't have Maverick, here's a BASIC program that will do the same general thing, just much more slowly). On the left is the raw disk GCR, and on the right is the interpreted bytes. The Maverick editor interleaves each sector header with its data, so the first two lines are sector 0, the next two are sector 1, and so on. We'll only need the interpreted bytes for our analysis as this disk isn't using a custom encoding.In the right column we can see each header on alternating lines. The header (which is described incorrectly in the 1571 user guide; see something like Inside Commodore DOS, the essential text on Commodore disk drives, for much more) starts with a sync mark of 40 1-bits, which Maverick doesn't show, then a byte $08, then a checksum (the exclusive-OR of the two ID bytes, track and sector numbers), then the sector number, the track number, the ID bytes set when the disk was formatted, and a padding gap. For track 31 sector 3, we seem to have a normal-looking header; in particular, the track and sector bytes in the header are $03 $1f, for 3 and 31 respectively.
The line I have highlighted on screen is sector 3's data. It has its own sync mark, then starts with a byte $07 and the 256 bytes of data. Usually this is the forward pointer to the next track and sector and 254 bytes after, but here the entirety is used. Compare the screenshot with what we got from the FastLoad disk editor; we're seeing the second view I got of the sector. No matter how many times I reloaded the disk from Maverick, all I saw for sector 3 was this.
The explanation comes when we try to go through the rest of the sectors.
Track 31 on a 1541 disk should have 16 sectors total and on this screen I have the header highlighted for sector 15. So far so good. Notice the header shows $0f $1f for sector 15, track 31. But when we go two rows down to where the header for sector 16 should be, the header reads ... $03 $1f again. The data immediately below it is what we got from the first view of the sector. This disk has two sectors that call themselves "sector 3."How could this possibly work? Recall that the Commodore 1541 and 1571 disk drives are largely "blind." More specifically, they generally only see what's under the read/write head, so they only know which track and sector they're on by what's in the header (the 1541-II doesn't even have an optical track zero sensor). It is non-deterministic which sector, i.e., the real sector 3 or the fake one that's actually a disguised sector 16, will come under the head first. As the drive's software assumes that the track and sector coordinate in the header is unique for every sector on the disk, it doesn't bother looking again once it sees something that matches, as under normal circumstances such a doppelgänger sector couldn't exist. The disk drive doesn't cache these reads, so the next time we ask, it will faithfully again retrieve the first "sector 3" it sees, which may or may not be the other one.
(Parenthetically, this means "logical" sectors don't need to be in physical order as long as the drive can see them and they are unique. You could even have a disk where the physical sectors are logically numbered differently, e.g., sector headers being reordered to reflect an ideal interleave. We'll have more to say about interleave later, but such disks and programs to generate them exist, such as Datamost KwikLoad.)
Maverick isn't fooled by this because it reads and interprets the raw GCR of the entire track at once under the assumption the headers could be untrustworthy (and they are). But a regular sector-by-sector read will indeed, as we saw in the FastLoad editor, randomly get one or the other. This routine must therefore be very flexible to compensate: whatever view of the sector it sees first is copied ("double STA") to that block of memory, but this is incomplete and the check byte will be wrong, so now the next time it goes through it will read again, exclusive-OR and then store. It has to get both views one after the other, which may take several tries, so it makes no assumptions otherwise. Once the check byte matches, it knows it succeeded in getting both sectors.
What happened with our disk copy in the emulator, then? Refer back to the three errors we saw when we imaged it. The drive got confused by the double sector, which explains the spurious "bad checksum" (error 23) for some of the other sectors, but also correctly noted that it couldn't access sector 16 at all (error 20) because nothing is tagged as sector 16. For sector 3, however, the only thing written to disk is one or the other of whichever "sector 3" the disk copy routine hit first; the other form of that sector will never be referenced because the disk copy routine already "got" that sector. On the resulting D64 image the stored sector 3 will thus be invariant, so the routine will run forever in the emulator because it will only ever receive the same data. (A G64 "nibbled" image would reproduce this accurately because it deals in raw GCR, but this is slower despite being a more complete storage format.)
After the load is this mildly naughty little section:
; clear channels, close 3, close all .C:9029 20 CC FF JSR $FFCC .C:902c A9 03 LDA #$03 .C:902e 20 C3 FF JSR $FFC3 .C:9031 20 E7 FF JSR $FFE7 ; store a $4c at $900f, making it jmp $c028 .C:9034 A9 4C LDA #$4C .C:9036 2C 0F 90 BIT $900F .C:9039 8D 0F 90 STA $900F .C:903c 10 D1 BPL $900F .C:903e 02 JAM
It clears the channels and closes everything, and then does an almost insulting little trick by storing a $4c (JMP) into the STA $C028,Y opcode, making it JMP $C028. The BIT instruction here will always clear the N flag, so the BPL branch also becomes a "branch always" and jumps into the new code at $c028 we would have just loaded.
I took the two sectors and merged them manually with a quick Perl script to get the correct bytes, and then disassembled them in dxa.
; open 15,8,15,"i0" lc028 lda #$03 ldy #$c0 ldx #$3e jsr $ffbd lda #$0f ldx #$08 ldy #$0f jsr $ffba jsr $ffc0 rts c03e: "i0" $0d
This just calls a "soft init" in the disk drive and returns ... all the way back to $c003, which (look back) is JSR $C041. That's also in our newly loaded code.
; open 5,8,5,"#" lc041 lda #$02 ldx #$57 ldy #$c0 jsr $ffbd lda #$05 ldx #$08 ldy #$05 jsr $ffba jsr $ffc0 rts c057: "#" $0d
This opens up a new sector buffer as channel 5, and returns back to $c006, which is JSR $C059. No, I don't understand why these don't come one after the other either except as another attempt at obfuscation.
Finally, this routine is the actual meat. I've annotated the calls it makes below and we'll look briefly at those smaller subroutines.
lc059 lda #$08 sta $fc lda #$00 sta $fb jsr lc098 ; send load sector command jsr lc0e3 ; read sector into $0800 lda $082b ; get number of sectors to load and ... sec sbc #$08 ; ... subtract eight sta $c018 inc $fc ; next page inc $c017 ; next sector lc075 jsr lc098 ; send load sector command jsr lc0e3 ; read sector into $0900 and so on inc $fc ; next page inc $c017 ; next sector dec $c018 beq lc097 ; return if loaded all sectors lc085 lda $c017 cmp #$15 bne lc075 lc08c inc $c016 ; next track. no more than 21 sectors to read per track through 18 lda #$00 sta $c017 jmp lc075 lc097 rts
This routine is loading sectors one after the other into memory starting with location $0800. We already know this is our target execution address, so it's very likely this is the payload we want to extract.
The routine at $c098 sends another block read command, but this time to the new channel 5 we opened. That command is stored in memory at $c019, as below:
C:$c17e) m c000 >C:c000 20 28 c0 20 41 c0 20 59 c0 a9 05 20 c3 ff a9 0f (. A. Y... .... >C:c010 20 c3 ff 4c 00 08 01 00 00 55 31 3a 30 35 2c 30 ..L.....U1:05,0 >C:c020 30 2c 00 00 2c 00 00 0d a0 0e b9 29 c0 59 28 c0 0,..,......).Y(.
Since the U1 command expects an ASCII track and sector number, this routine will convert it, and then send the string. This is a family of several related routines. The load from track 31 sector 3 was only 256 bytes, so it only covers some of the code (you'll see the break below where the execution path switches back to what was already loaded as part of BOOT2).
; turn track-sector at $c016 $c017 into ASCII for the U1 command lc098 lda $c016 jsr lc126 ; converts stx $c022 sta $c023 lda $c017 jsr lc126 asl $c025 ora $c026 jsr lc0b7 jsr lc0c2 rts lc0b7 lda #$0f ; sets up command string pointer ldx #$19 stx $fd ldx #$c0 stx $fe rts ; send command to disk drive lc0c2 pha ldx #$0f jsr $ffc9 bcc $c0cd lc0ca jsr $c13c ; appears to be a debugging stub if an error ldy #$00 pla tax lc0d1 lda ($fd),y jsr $ffd2 bcc lc0db lc0d8 jsr $c13c ; same lc0db iny dex bne lc0d1 lc0df jsr $ffcc rts [...] ; do conversion to ASCII numbers lc126 ldx #$00 ;; bring back BOOT2 .C:c128 C9 0A CMP #$0A .C:c12a 90 07 BCC $C133 .C:c12c E8 INX .C:c12d 38 SEC .C:c12e E9 0A SBC #$0A .C:c130 4C 28 C1 JMP $C128 .C:c133 48 PHA .C:c134 8A TXA .C:c135 09 30 ORA #$30 .C:c137 AA TAX .C:c138 68 PLA .C:c139 09 30 ORA #$30 .C:c13b 60 RTS ; debugging stub (calls 6502 software interrupt) .C:c13c 00 BRK .C:c13d 60 RTS .C:c13e 00 BRK
From the routine at $c059 we know that it is incrementing $c017 for the sector, and when that exceeds 21 sectors, it increments $c016 (because tracks 1-17 contain 21 sectors). The default values for those locations are (in memory order) 1 and 0, meaning it's loading from track 1 sector 0, the very first sector on the disk. The load routine is very simple.
; read sector lc0e3 ldx #$05 jsr $ffc6 ; set input channel bcc $c0ed lc0ea jsr $c13c ; bomb on error ldy #$00 lc0ef jsr $ffcf ; read byte bcc lc0f7 lc0f4 jsr $c13c ; bomb on error lc0f7 sta ($fb),y iny bne lc0ef ; loop 256 times lc0fc jsr $ffcc ; clear channels rts
That first sector looks like this:
00000000 ea 4c d3 08 ea 4c db 08 31 2e 31 32 43 4f 4d 4d |.L...L..1.12COMM| 00000010 4f 44 4f 52 45 20 36 34 73 6c 08 00 90 ab 80 00 |ODORE 64sl......| 00000020 ff 01 00 01 06 00 88 a3 88 6c 88 6c c4 41 2c 2a |.........l.l.A,*| 00000030 46 1d 26 1d ea 6b 68 26 b4 1c 09 1d f5 1b 17 6c |F.&..kh&.......l| 00000040 fc 1f c7 1c 8c d1 96 1c 0b ac 9a ce 39 29 8f 5b |............9).[| 00000050 db 1c d6 29 9e 0f 28 09 cb 0d c0 11 37 12 0c 00 |...)..(.....7...| 00000060 87 09 18 03 cb 0d 28 09 b4 0d 2c 09 87 09 18 03 |......(...,.....| 00000070 b4 0d 87 09 18 08 cb 0d 87 09 8c 16 4a 10 b4 0d |............J...| 00000080 8a 16 68 28 79 0d f5 1d be 0c ac 0d b4 0d be 0c |..h(y...........| 00000090 a2 0d b4 0d 33 27 5f 24 9e 0f c7 1c 7b 1c 25 17 |....3'_$....{.%.| 000000a0 b4 0d 70 1c 1b 17 b4 0d 87 09 19 00 87 09 00 00 |..p.............| 000000b0 2a 13 87 09 00 00 87 09 00 d4 81 0f ac 0f e0 0d |*...............| 000000c0 76 12 f0 ff b4 1c 8d 0a 7d 13 84 43 4f 4c c4 00 |v.......}..COL..| 000000d0 00 d3 08 a9 08 a2 54 a0 3b d0 06 a9 08 a2 98 a0 |......T.;.......| 000000e0 0d d8 86 88 85 89 20 5b 17 a9 1b 8d 11 d0 a9 c8 |...... [........| 000000f0 8d 16 d0 a9 17 8d 18 d0 a9 00 8d 20 d0 8d 21 d0 |........... ..!.|
When loaded into memory, the byte $6c at offset $2b will be in location $082b. That value minus eight (note the subtraction) yields a count of $64, or 100, additional sectors to be loaded, for a total payload size of 25856 bytes. (Note that since the total number of sectors is not fixed but rather specified by the payload, I suspect that this or a similar loader was used for other DesignWare titles where the second-stage loader length varied.)
101 sectors is less than 5 tracks' worth if we start at track 1, as the first 17 tracks all have 21 sectors, which makes the routine straightforward. What it isn't is efficient, for two reasons. First, it has an effective sector interleave of one, while the default interleave on a 1541 is ten; i.e., it hopscotches ten sectors between those it loads, expecting at the drive spindle's typical rotational speed that the distant sector will be most likely already under the drive head ready to go on the next read. (This value is stored in 1541 RAM at location $0069 [SECINC], but an exhaustive disassembly of the 1541 ROM shows the only place the interleave differs is on the directory track 18, where the interleave is 3.) The optimal interleave of a 1541 disk is sometimes debated and will vary depending on the loader and other factors, but it is certainly not one, as the spindle will likely have rotated the disk away from the immediately following sector by the time the disk drive is ready to read it.
But — as we'll see — a far bigger contributor to load time is that this routine loads bytes from each sector one by one. Since that's an IEC bus transaction for each byte, it generates several times more IEC bus traffic than a regular file load using "bulk transfer" mode (i.e., the IEC "data" command byte but with a special reserved channel of 0) that sends the entire contents of a file to the computer in one huge transaction.
Finally, this sector loader routine terminates and we end up back at $c009, which closes channels 5 and 15, and jumps to $0800.
While this is a terrible way with a real 1541 to store the payload, it makes extracting it from the D64 (now that we know where it is) stupidly easy. It starts at track 1 sector 0, which is the very beginning of the D64, and everything is stored sector after sector and track after track in ascending numerical order, so we just take the first 25856 bytes of the image. That's the entirety of the executable it's loading and where we find the second-stage loader.
As we scroll through it in a hex editor, we start seeing text words.
000001e0 7f 09 a3 16 06 82 52 b0 dc 09 a3 16 08 83 54 49 |......R.......TI| 000001f0 c2 e5 09 a3 16 0a 85 57 49 44 54 c8 ed 09 a3 16 |.......WIDT.....| 00000200 0c 86 4d 45 4d 54 4f d0 f6 09 a3 16 0e 85 46 45 |..MEMTO.......FE| 00000210 4e 43 c5 01 0a a3 16 10 82 44 d0 0d 0a a3 16 12 |NC.......D......| 00000220 88 56 4f 43 2d 4c 49 4e cb 18 0a a3 16 14 86 27 |.VOC-LIN.......'| 00000230 2d 46 49 4e c4 20 0a a3 16 16 85 2d 46 49 4e c4 |-FIN. .....-FIN.| 00000240 2e 0a c5 16 16 85 27 3f 4b 45 d9 3a 0a a3 16 18 |......'?KE.:....| 00000250 84 3f 4b 45 d9 45 0a c5 16 18 8a 27 3f 54 45 52 |.?KE.E.....'?TER| 00000260 4d 49 4e 41 cc 50 0a a3 16 1a 89 3f 54 45 52 4d |MINA.P.....?TERM| 00000270 49 4e 41 cc 5a 0a c5 16 1a 86 27 41 42 4f 52 d4 |INA.Z.....'ABOR.| 00000280 6a 0a a3 16 1c 85 41 42 4f 52 d4 79 0a c5 16 1c |j.....ABOR.y....| 00000290 86 27 42 4c 4f 43 cb 85 0a a3 16 1e 85 42 4c 4f |.'BLOC.......BLO| 000002a0 43 cb 90 0a c5 16 1e 83 27 43 d2 9c 0a a3 16 20 |C.......'C..... | 000002b0 82 43 d2 a7 0a c5 16 20 84 27 43 56 c8 b0 0a a3 |.C..... .'CV....| 000002c0 16 22 83 43 56 c8 b8 0a c5 16 22 85 27 45 4d 49 |.".CV.....".'EMI| 000002d0 d4 c2 0a a3 16 24 84 45 4d 49 d4 cb 0a c5 16 24 |.....$.EMI.....$| 000002e0 86 27 45 52 52 4f d2 d6 0a a3 16 26 85 45 52 52 |.'ERRO.....&.ERR| 000002f0 4f d2 e0 0a c5 16 26 87 27 45 58 50 45 43 d4 ec |O.....&.'EXPEC..| 00000300 0a a3 16 28 86 45 58 50 45 43 d4 f7 0a c5 16 28 |...(.EXPEC.....(| 00000310 85 27 48 4f 4d c5 04 0b a3 16 2a 84 48 4f 4d c5 |.'HOM.....*.HOM.| 00000320 10 0b c5 16 2a 8a 27 49 4e 54 45 52 50 52 45 d4 |....*.'INTERPRE.| 00000330 1b 0b a3 16 2c 89 49 4e 54 45 52 50 52 45 d4 25 |....,.INTERPRE.%| 00000340 0b c5 16 2c 84 27 4b 45 d9 35 0b a3 16 2e 83 4b |...,.'KE.5.....K| 00000350 45 d9 44 0b c5 16 2e 85 27 4c 4f 41 c4 4e 0b a3 |E.D.....'LOA.N..| 00000360 16 30 84 4c 4f 41 c4 57 0b c5 16 30 88 27 4d 45 |.0.LOA.W...0.'ME| 00000370 53 53 41 47 c5 62 0b a3 16 32 87 4d 45 53 53 41 |SSAG.b...2.MESSA| 00000380 47 c5 6c 0b c5 16 32 87 27 4e 55 4d 42 45 d2 7a |G.l...2.'NUMBE.z| 00000390 0b a3 16 34 86 4e 55 4d 42 45 d2 87 0b c5 16 34 |...4.NUMBE.....4|
Forth programmers will recognize some of them and the general format. Each word begins with a length byte with its high bit set and ends with the high bit of its last character also set, each acting as a delimiter (the name field). Each word also has a backlink to the previous word (the link field), a pointer to what should be executed (code field), and any trailing data (parameter field). This builds a linked list of words in memory.
Executable Forth words are typically threaded calls, where the pointer in the code field points to a Forth runtime routine to step through each call in the parameter field (MicroMotion calls this routine DOCOL), but can also be raw 6502 machine language (a so-called "code" word), where the execution pointer points into the parameter field itself and runs it directly. The very first word in this program, not shown here, is named COLD and is a code word. In most Forth implementations of the period, COLD and WARM would serve as the routines that initialize the Forth runtime, though as there is no need to warm-start the runtime here, it doesn't seem to be part of this dictionary.To prepare a Forth dictionary for use as a commercial product, the MicroMotion manual required you to specify what word would run on startup (using TURNKEY, which other Forths also implemented) and what word would run for errors (ONERR), and to disable the formation of new words with DESTRUCT. The dictionary was then saved and would run all the words previously compiled but not be able to form new ones. It's difficult to tell exactly how many words are in the program total, especially since some are likely paged in from other portions of the disk, and some sectors may well be orphans containing old data that's never used. At least for this initial loader, however, we're probably talking three figures at least, which would have been a sizeable number of symbols for an 8-bit computer.
It also can't be determined whether this program was built directly on the Commodore 64, or on some other system (like the Apple II) with Commodore-specific code added later. However, we do know that it's MicroMotion-generated as implementation-specific words like COPY-BLOCKS and COPY-DISK are present. There are also obviously Commodore-specific words like CALL-KERNAL, plus code words for major Kernal routines:
0000e420 91 23 42 4c 4f 43 cb 97 6c a2 15 0b 00 85 43 48 |.#BLOC..l.....CH| 0000e430 4b 49 ce a4 6c a2 15 c6 ff 86 43 48 4b 4f 55 d4 |KI..l.....CHKOU.| 0000e440 b1 6c a2 15 c9 ff 85 43 48 52 49 ce bd 6c a2 15 |.l.....CHRI..l..| 0000e450 cf ff 86 43 48 52 4f 55 d4 ca 6c a2 15 d2 ff 85 |...CHROU..l.....| 0000e460 43 4c 4f 53 c5 d6 6c a2 15 c3 ff 86 53 45 54 4c |CLOS..l.....SETL| 0000e470 46 d3 e3 6c a2 15 ba ff 86 53 45 54 4e 41 cd ef |F..l.....SETNA..| 0000e480 6c a2 15 bd ff 84 4f 50 45 ce fc 6c a2 15 c0 ff |l.....OPE..l....| 0000e490 86 43 4c 52 43 48 ce 09 6d a2 15 cc ff 85 43 4d |.CLRCH..m.....CM|
These call a routine living at $15a2 that bridges the Forth-raw ML impedance mismatch with the address of the Kernal routine in question in the parameter field. Other words handle the sound output, including the very amusing SHUTUP.
In other sectors we can see the game data and grammar tasks, though unlike most Commodore programs, the human-facing text portions are rendered in true ASCII and not Commodore-native PETSCII. That's certainly the strongest implication the C64 version may not have been (entirely) developed on the C64 itself. There is also an on-disk copyright message, which seems to have been placed entirely for miscreant grown-up children rooting through the code like me, and is 7-bit ASCII. It doesn't seem to be referenced anywhere in the program.
0002a940 57 41 52 4e 49 4e 47 3a 20 54 48 49 53 20 50 52 |WARNING: THIS PR| 0002a950 4f 47 52 41 4d 20 41 4e 44 20 49 54 53 20 44 41 |OGRAM AND ITS DA| 0002a960 54 41 20 43 4f 4e 54 41 49 4e 20 50 52 4f 50 52 |TA CONTAIN PROPR| 0002a970 49 45 54 41 52 59 20 20 20 20 20 20 20 20 20 20 |IETARY | 0002a980 49 4e 46 4f 52 4d 41 54 49 4f 4e 20 41 4e 44 20 |INFORMATION AND | 0002a990 54 52 41 44 45 20 53 45 43 52 45 54 53 20 4f 46 |TRADE SECRETS OF| 0002a9a0 20 44 45 53 49 47 4e 57 41 52 45 2c 49 4e 43 20 | DESIGNWARE,INC | 0002a9b0 41 4e 44 20 01 01 01 01 01 01 01 01 01 01 01 01 |AND ............| 0002a9c0 41 52 45 20 4e 4f 54 20 54 4f 20 42 45 20 43 4f |ARE NOT TO BE CO| 0002a9d0 50 49 45 44 20 46 4f 52 20 41 4e 59 20 50 55 52 |PIED FOR ANY PUR| 0002a9e0 50 4f 53 45 20 57 49 54 48 4f 55 54 20 57 52 49 |POSE WITHOUT WRI| 0002a9f0 54 54 45 4e 20 01 01 01 01 01 01 01 01 01 01 01 |TTEN ...........| 0002aa00 50 45 52 4d 49 53 53 49 4f 4e 2e 20 74 48 45 59 |PERMISSION. tHEY| 0002aa10 20 41 52 45 20 50 52 4f 54 45 43 54 45 44 20 55 | ARE PROTECTED U| 0002aa20 4e 44 45 52 20 53 54 41 54 45 20 41 4e 44 20 46 |NDER STATE AND F| 0002aa30 45 44 45 52 41 4c 20 4c 41 57 2e 20 20 20 20 20 |EDERAL LAW. | 0002aa40 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 |................|
So noted. With the payload thus extracted, an easy first cut is to compress it down with pucrunch with an execution address of $0800 (if the first 25856 bytes of the D64 are saved as gramload.prg and we add starting address bytes of 00 08 at the beginning, then pucrunch -c64 +f -ffast -x2048 gramload.prg gramload will do the job). This yields a binary which is about 30% smaller, LOADs and RUNs like a BASIC program, and decompresses in a few seconds. For the emulator, this is quite simple to work with: I put the crunched Grammar Examiner loader in my prg folder that VICE treats as a very fast virtual disk, and then run it with the sector dumped D64 in drive 8 to play the game. This is also great for something like a 1541-Ultimate where the loader can be DMAed straight into memory.
For a real C64/128 with just a regular disk drive, though, we'll want to get this back on the original disk; it would be clumsy to juggle floppies. Fortunately, since the crunched binary is 30% smaller, it will easily fit in the sectors the original payload was loaded from even with the overhead of adding next sector links to each sector, and load even faster due to the reduced size. As such, the Kernal effectively becomes our first-stage loader instead of using DWARF. After loading and running it, the program self-decompresses to memory, and then jumps right into the second-stage loader.
This simple Perl script will take our compressed payload on standard input and emit sectors back to back with the proper linkages on standard output.
#!/usr/bin/perl use bytes; read(STDIN, $buf, 65536); # make it an even multiple of 254 bytes for ease $buf .= "\0" x (254 - (length($buf) % 254)) if (length($buf) % 254); print STDERR "splicing @{[ length($buf) ]} bytes/@{[ length($buf)/254 ]} sectors\n"; $t = 1; $s = 0; $l = length($buf) - 254; for($i=0; $i<length($buf); $i+=254) { $s++; if ($s == 21) { $t++; $s = 0; } if ($i == $l) { print STDERR "last sector\n"; print STDOUT pack("H*", "00ff"); } else { print STDOUT chr($t).chr($s); } print STDOUT substr($buf, $i, 254); } print STDERR "finished at $t $s\n";
The result is a lake of sectors 18432 bytes long. We take the D64 sector dump of the disk and add on the remaining 156416 bytes starting at offset 18432 for a total length of 174848 bytes, the nominal size. Finally, we'll jump into the D64's directory track with a hex editor and add a link to our new file at track 1 sector 0. I "deleted" the original booter files, but kept them for posterity along with the other ghosts. Since we're talking about DWARFs (I also have a cloudy memory that one of our DesignWare disks had a reference to FIDDLE), a Firesign Theatre reference seems appropriate.
I wrote this out to a 5.25" floppy and did a few tests on the 128DCR. Because the loading time from the start of the second-stage loader (when the screen goes black) to the title screen is all raw track-and-sector access and therefore constant, any time savings will be observed in bringing up the second-stage loader. These times were done by me with a stopwatch. They necessarily include the time spent in memory decompression since that's technically part of the load (approximately a fixed 5-second penalty). I tested with and without Epyx FastLoad, which is my fastload cartridge of choice due to its ubiquity and compatibility.
1:24 original disk (regardless of fastloader)
0:52 1 interleave no FastLoad
0:23 1 interleave FastLoad
This is already a big savings even on a stock 64 with no fastloader at all, but we already know we're using a non-standard interleave, so can we do better?
The answer is, "it's complicated." Below is a hacky little Perl script that emits out entire tracks with an adjustable interleave, defaulting to 10.
#!/usr/bin/perl -s use bytes; read(STDIN, $buf, 65536); # interleave $interleave ||= 10; # make it an even multiple of 254 bytes for ease $buf .= "\0" x (254 - (length($buf) % 254)) if (length($buf) % 254); # ass-U-me all tracks are 21 sectors long $scount = length($buf)/254; $tcount = int(($scount/21)+0.999999); print STDERR "splicing @{[ length($buf) ]} bytes/$scount sectors in $tcount tracks interleave $interleave\n"; # paranoia if we get longer $maxtrax = 5; $foff = 0; $end = length($buf) - 254; $ss = 0; $ns = 0; for($t=1;$t<=$tcount;$t++) { @tbam = (); $nt = $t; # handle last track differently than the others if ($t < $maxtrax) { for ($s=0; $s<21; $s++) { # predict next sector. this is 'close enough' # to the 1541's algorithm if ($s == 20) { # out of sectors, step track # 1541 algorithm keeps the same sector # $nt++; } else { $ns = ($ss + $interleave) % 21; while (length($tbam[$ns])) { $ns++; $ns = $ns % 21; } } # print STDERR "$nt $ss -> $ns\n"; if ($foff >= length($buf)) { $tbam[$ss] = chr(0) x 256; } elsif ($foff >= $end) { $tbam[$ss] = chr(0) . chr(255) .substr($buf, $foff, 254); } else { $tbam[$ss] = chr($nt) .chr($ns) .substr($buf, $foff, 254); } $ss = $ns; $foff += 254; } # fill in empty sector holes for ($s=0; $s<21; $s++) { $tbam[$s] = chr(0) x 256 if (!length($tbam[$s])); } $track = join('', @tbam); die("assert: track $t is wrong size @{[ length($track) ]}\n") if (length($track) != 21*256); print STDOUT $track; @tbam = (); } else { # for the remainder, emit sequentially while($foff < length($buf)) { print STDOUT chr($t) .chr($ss++) .substr($buf, $foff, 254); $foff += 254; } } }
The separate handling for track 5 was paranoia in case my calculations were off and it ended up being longer than four tracks, but fortunately this isn't the case (I left the code in for future expansion). For veracity I checked its results against a real floppy with files written with the 1541's own ROM routines and the sector interleave output at the default value of 10 seemed the same. We'll then fill in the remainder with the original sector dump to get a proper D64 and write that to a floppy.
Does the interleave make a difference? Here's the complete table including our three prior entries:
1:24 original disk (regardless of fastloader)
0:52 12 interleave no FastLoad
0:52 10 interleave no FastLoad
0:52 6 interleave no FastLoad
0:52 1 interleave no FastLoad
0:25 6 interleave FastLoad
0:23 1 interleave FastLoad
0:17 12 interleave FastLoad
0:15 10 interleave FastLoad
First off, even with a stock Kernal/1541 and no fastloader, executing a LOAD instead of a byte-by-byte sector-by-sector read is way faster. If we remove the 5-second fixed decompressor penalty and multiply that 47 second time by 25856/18432 to compensate for the fewer sectors loaded, we get 66 seconds loading the same amount of payload, which is still 18 seconds and 21% faster. 52 seconds is thus already a big improvement, and even more so when a fastloader — any fastloader — is involved.
But we expected that. What was unexpected was that interleave didn't seem to make any difference at all to the stock Kernal/1541 ROM loader. It should be noted that we're doing this on track 1 and going inward instead of track 17 and going outward as you'd optimally do, and this is a 1571 instead of a 1541, but the 1571's compatibility is generally excellent and functions indistinguishably from a 1541, and you'd think that there would be some load time difference between interleaves if it really were a factor. My conclusion here is that the time spent waiting for the next sector is much less in comparison to the time spent actually transferring data, and if that's true, then it probably made virtually no difference to the original disk's loader after all. I didn't test every single value but it seems very unlikely it would change with anything else.
Where interleave does make a difference is when Epyx FastLoad is active. There are interleaves that are clearly worse and clearly better, and again I didn't test every single value, but FastLoad seems calibrated to achieve best performance on the default interleave of 10. And that makes sense, really, because it had to show improvement on existing software that likely would have been written to disk with that interleave. Notably, on an interleave of 10 and removing the five-second decompressor penalty, FastLoad reads in the same second-stage loader in 10 seconds that the stock Kernal does in 47 seconds, very close to the 5x speedup Epyx's marketing material claimed.
This is already pretty fast. You could maybe shave a few more seconds by using a different loader (WarpSpeed or Super Snapshot should easily beat it), or, since we're already working with track 1, you could make it into a 128 mode autobooter. On the other hand, as the loading time gets smaller, the fixed decompression time (and we're already using pucrunch's fast mode on purpose) starts to dominate, and a 128 mode autobooter would incur the additional penalty of switching to 64 mode. Since I like Scott Nelson's black beauty, I'm perfectly happy with 15 seconds to start. From the 135 seconds total time to the title screen on the original, we're now down to 65 on actual hardware, over twice as fast. And on an emulator it flies.
The last frontier is to ensure the entire title is playable, because the road to crack heaven is littered with hal-fassed attempts that didn't account for the game abruptly doing a protection check later on. Fortunately, my wife insisted on playing it all the way through and beating poor old Mel.
Don't look at the screen if you don't want any spoilers.In the end, this article proves The Grammar Examiner was far more educational for far longer than I (or, probably, its designers) ever thought possible. And hey: look at the quality of the prose in this blog. Even my wife couldn't find any errors, and she's had a lot of practice.
This looks so nasty, and I mean in a good way. Stepping through this brought back too many memories of last night hacks and tons of forgotten 6502 assembly.
ReplyDeleteThe most insidious protection I saw was code that loaded directly into the 1541 RAM and decrypted sectors from there. The code I ran into changed the interleave and I think the rotational speed to get its data loaded.
I didn't have much in the way of direct hacking tools for the C64, except for one cheater cartridge. I think it was called Icepik. It had a button on the cartridge that you pressed after your game completely loaded (after passing all of its copy protection steps) and was at its title screen or some other idle loop. It would save a full snapshot of the C64 RAM space to another disk and create a small BASIC loader with a SYS statement to jump back into the assembly. It didn't work on all games but worked enough for me to keep it around. It sounds like it could have worked on your app. Of course, that would have defeated the purpose of learning all the neat hacks this code offered to you.
Yup, I've got an Isepic here myself. It would probably have worked just fine at the title screen since it didn't load any special code into drive memory, but as you say, the crack was more fun (and elegant :-).
Delete