Sunday, September 27, 2020

Hacking a gopher client into the Alpha Micro

I have great personal affinity for the Alpha Micro multiuser systems, which were the first multiuser computers I ever experienced (an AM-2000 and a battery of terminals at the Salvation Army corps my family and I attended as a kid). These blue-collar machines were famous for their high level of vertical integration; many ran back offices from churches to veterinary offices, and I'll bet a substantial number still do, sitting in a closet quietly continuing to do their job. It is estimated that around fifty or sixty thousand Alpha Micro systems were produced based on known serial numbers. My two main systems came from a party store and a emergency dispatch center, and the primary one (a 1994 AM-3500 Eagle 300, front panel closeup shown above, and the subject of a recent Refurb Weekend) has a webserver and you can access it.

AMOS, the Alpha Micro Operating System that they run, is a curious beast especially to modern sensibilities, largely because of its strong DEC influence via TOPS-10. Although AMOS is not a descendant, it adopted many of its conventions to the point where DEC actually (though unsuccessfully) sued Alpha Micro over it in 1984. The filesystem is 6.3 and case-insensitive, and instead of subdirectories files are divvied up by octal project-programmer numbers (like [100,7]) on specific devices; all file access is by device and PPN, or aliased pseudodevices called ersatzes that point to specific locations. On the CPU side, AMOS is populated by process-like entities called jobs (at least one is attached to every terminal or pseudoterminal) with a system limit on the number of jobs that can exist. On startup jobs have a fixed memory partition allocated to them to store their data and program code, and their permissions are generally determined by the particular PPN they started logged into (for example, [1,2] is the operator and essentially "root"). Some components can be loaded into a systemwide memory partition and be available to all users, though each user does not get a separate copy (so things like COMMON.SBR that implement a primitive key-value store may allow users to see other users' data, unless they are loaded separately into individual user partitions and/or the program is specifically engineered to be "reentrant").

AMOS is also one of a few operating systems (in fact, I struggle to think of any, though there probably are some) that effectively runs the 68K CPU in little endian mode. The CPU is still big-endian internally but the bus lines are swapped in order to preserve data file compatibility with earlier version of AMOS that ran on customized WD-16 CPUs. Opcodes in particular are little-endian, which throws off the ability to easily disassemble code, and Alpha Micro's somewhat non-standard mnemonics don't help. AMOS also does not use nor require an MMU and thus is a strictly real-memory OS with no concept of virtual addresses. The kernel (which is called the "monitor") can still preemptively multitask between jobs on the system, but to deal with memory partitions occupying different addresses, programs are required to run from any location they are placed ("relocatable"). Memory protection is likewise relatively limited in scope, meaning a program that goes awry may occasionally require the machine be forcibly rebooted, although fortunately the OS and filesystem are tolerant of this. The gemisch of components and jobs loaded on system startup is very reminiscent of classic MacOS and while they tend to be better coded and conflicts are rare, once in a great while they can clash just as badly. Making a wrong move with the startup script AMOS32.INI is a great way to render your machine unbootable.

One collection of these components is AlphaTCP, effectively a port of SpiderTCP (and enough of SpiderStreams to implement it), an earlier TCP/IP stack whose most famous implementation was as part of Windows NT 3.1 (replaced in 3.5). However, until the very last version of AlphaTCP it had no means for programs to directly open network connections of their own -- and this particular API, if you can call it that, is pretty baroque anyway -- and my workhorse Eagle 300 still runs an earlier version that doesn't have it regardless. That means relying on what preexisting functionality is already present to support the widest variety of machines. Depending on release AlphaTCP comes with device drivers, servers for HTTP, SMTP, FTP, Telnet, POP and rwhod, the usual utilities such as ping and traceroute (which it calls TROUTE), and clients for Telnet, FTP, mail, HTTP, TFTP and ... finger. Hmmm.

You see, one of my other great personal affinities is Gopherspace. Especially with the "TLS apocalypse" severely limiting the ability of older browsers to access modern Web sites, I think Gopherspace is a better fit for old machines, and clients can be written even for slow memory-constrained 8-bit platforms. (That's an upcoming post, by the way.) The protocol is trivial and interpreting a menu is much, much, much less complex than even relatively simple web pages. And interestingly, the finger protocol and the gopher protocol are in broad strokes identical: send a selector followed by CRLF, get a result, close the connection. The finger client is also well-behaved and can emit files to disk with redirected I/O. Hmmmmmm.

The original way I solved this problem was something I called the "fingerproxy," a modified finger daemon that runs on a proxy host and forwards or parses commands. But this is clunky and isn't self-hosted, and eventually the limitations of this approach became excessive, so a couple weeks ago I decided to make the AlphaTCP finger client do the work itself. At least at first blush it seemed like converting FINGER.LIT (the .LIT extension in AMOS means a "literal" binary file, i.e., an executable) to access a Gopher server instead would just be a matter of changing the port number. (NARRATOR: It wasn't.) So I decided to find the presumed port number and patch it in the binary. The 68000 notionally deals in 16-bit shorts, so I looked for various permutations of 4f (79, the port number for finger) such as 004f or 4f00.

Unfortunately FINGER.LIT is a "huge" 52K binary, the reason for which will become apparent shortly, and this byte appears in lots of places. One of them, however, was part of a string (because it is also the letter "O"). So I ran it through strings and this one popped out:

finger/tcp unknown service

This means it must look up the port number out of the services list! AlphaTCP's equivalent of /etc/services is TCP:SERVIC. (and written in the same format). I edited that file and changed finger's TCP port to port 70, and ...

.finger /@gopher.floodgap.com
3 '/W /' doesn't exist! error.host 1
i This resource cannot be located. error.host 1

It worked!

The next experiment was to copy to a new binary GOPHER.LIT and change all strings in the new file that had finger to gopher so it would use that service entry instead. There weren't a great deal of these and since it was exactly the same length string, it was an easy in-place substitution. This wasn't quite as successful:

.gopher
?GOPHER.RTI not found.

RTI files are "runtime initializers" (essentially the AMOS equivalent of a data segment in a separate file). This is a possible clue it was written in C, not in hand-coded assembler as many Alpha Micro tools are, and being written in C also explains the large(ish) size of the file. I assumed nothing in the RTI depended on these changes and just gave it a copy of FINGER.RTI.

.dir finger.*
FINGER LIT 105 DSK0:[1,4]
FINGER RTI 6
Total of 2 files in 111 blocks

.copy gopher.rti=finger.rti
FINGER.RTI to GOPHER.RTI
Total of 1 file transferred
.gopher /@gopher.floodgap.com
3 '/W /' doesn't exist! error.host 1
i This resource cannot be located. error.host 1

Success!

At this point those of you unfamiliar with the finger protocol will wonder where the /W is coming from. This is documented in the protocol description in RFC 1288 (Gopher's, for the record, is RFC 1436), stating it is sent "to [request] a higher level of verbosity in the user information output." Since by now I strongly suspected this was a compiled C program, I located the string /W and turned it into nulls, effectively making it a zero-length null terminated string. This successfully suppressed the string completely and now I could send selectors and view raw Gopher menus and text files.

This was, however, essentially a single-protocol lunkheaded equivalent of cURL, not an actual console shell. Since lots of strings would be getting slung around I decided to implement a front-end shell using AlphaBASIC, which virtually every AMOS machine comes with and has unusually good string handling for a BASIC dialect. It would create a .CMD file (analogous to a Windows-DOS batch file) that would call GOPHER.LIT with a selector and host, dump the resulting data to a file, and chain back into the front end. This worked great for text files and it paginated them perfectly. It also displayed gopher menus raw on screen just fine, including with what appeared to be the typical tab delimiters between gopher menu item fields (read the RFC for details). But when I tried to actually parse those menu items, it couldn't find any tabs at all.

When in doubt, look at what's coming over the wire. What was coming over the wire (according to my local network snooper) was intact, but when GOPHER.LIT wrote the output to a file, a hex dump of the actual file showed the tabs were getting automatically expanded out into spaces. Weirder still, instead of the typical CRLF line endings I was expecting to see in a gopher menu, I was seeing CRCRLF. What gives?

Because of the little endian byte ordering I mentioned before, disassembling the code turned into a marathon process of flipping 16-bit word byte order and feeding it in sections to a 68K disassembler expecting big endian opcodes. Sensible code resulted but for the life of me I could not find any code that obviously turned tabs into spaces. I found the code that parsed whitespace in the services file (in fact, I found that in a couple different places), but nothing that actually generated white space or, for that matter, doubled-up carriage returns.

The remaining variable was the redirection of the output to a disk file. In AMOS, this is done by intercepting terminal output routines. What if this was something the terminal driver was doing? I wrote a tiny little assembler program to test.

SEARCH SYS
SEARCH SYSSYM

PHDR -1,,
MOVB #9.,D1
TOUT
TYPECR <Hello, world!>
EXIT

END

68000 assembly language programmers will have recognized exactly one opcode in that file (and even that is atypical: you probably would write it as MOVE.B). The rest of it are directives or macros; writing assembly programs on AMOS is actually a real pleasure because of the rich number of macros and monitor service calls to make your life easier. More about that in a second, too, but what this is doing is emitting a tab character and the string Hello, world! followed by CR. I assembled it with the trusty M68 assembler, emitting this tiny object file (the first four bytes are a disk pointer and are not actually part of the file):

000000: 00 00 00 00 FF FF 00 01 00 00 00 00 00 00 3C 12
000010: 09 00 06 A0 0C A0 48 65 6C 6C 6F 2C 20 77 6F 72
000020: 6C 64 21 0D 00 00 12 A0

You can see there is no line feed or spaces, and you can see the CR (0x0d) and tab (0x09). I ran it, redirecting its output to a disk file. The file dumped like so:

000000: 00 00 00 00 20 20 20 20 20 20 20 20 48 65 6C 6C
000010: 6F 2C 20 77 6F 72 6C 64 21 0D 0A

The file had spaces instead of the tab character, and had CRLF at the end instead of CR. The terminal driver was indeed munging the output.

Monitor calls on AMOS are implemented using 68K A-line traps. Whenever the processor sees an instruction with a first (most significant) nybble of 0xA, an A-line exception is triggered and sent to the operating system. (Most 68K operating systems exploit this feature for system calls; 68K Macintoshes implement Toolbox calls this way. There are also F-line traps for FPU instructions which notionally allow emulation on systems that lack them, though interestingly the Atari ST uses both A-line and some F-line traps as GEM calls.) In the bowels of the Alpha Micro monitor calls documentation is a table for a terminal status bitfield called T.STS:

The T$DAT bit enables "complete data transparency." This sounds like exactly what we want! It says we should manipulate it only with the TRMWST (terminal write status) and TRMRST (terminal read status) monitor calls. These are macros and their definitions in SYS.M68 look like this:

DEFINE TRMRST DST,PORT
        IF B,PORT,      SUB     A6,A6
        IF NB,PORT,     LEA     A6,PORT
        SVCA    171,,,,D7,DST
ENDM
DEFINE TRMWST SRC,PORT
        IF B,PORT,      SUB     A6,A6
        IF NB,PORT,     LEA     A6,PORT
        SVCA    172,,SRC,D6
ENDM

The B and NB mean "blank" (not provided) and "not blank" (provided) with respect to the parameters. We don't need the port argument since we are only interested in talking to the default terminal. The source and destination parameters are for sending and receiving the new value respectively. That gets passed to another macro called SVCA, which is the supervisor call macro, shown here:

DEFINE  SVCA    ARG,A,B,C,D,E
        IF      NB,A
         NTYPE  ...X,A
         IF NE, ...X-^O26, LEA A6,A
        ENDC
        IF      NB,B
         NTYPE  ...X,B
         NTYPE  ...Y,C
         IF NE, ...X-...Y, MOV B,C
        ENDC
        WORD    ^H0A000+^O'ARG*2
        IF      NB,E
         NTYPE  ...X,D
         NTYPE  ...Y,E
         IF NE, ...X-...Y, MOV D,E
        ENDC
ENDM

Without getting too deep in the weeds here (too late?) this macro looks at the types of the arguments and if they aren't suitable issues instructions to get everything in the right place, since the 68K instruction set is not fully orthogonal. The real magic here is the WORD pseudo-op: this is what emits the A-line instruction. The default radix for M68 is octal (I told you this was heavily DEC-influenced), so punching a couple figures into the ol' scientific calculator, the A-line instructions that TRMRST and TRMWST respectively generate are 0xA0F2 and 0xA0F4.

The next thought was where to sneak this in. I figured I could mash this into a few shorts and trim down one of the messages for space; I'd only need to do this when the executable starts because this bit is helpfully reset for us when returning to the AMOS command prompt. However, it would be easy enough to check first if the code was already making these calls for some other purpose, since it seemed very unlikely to see an F2 A0 (to read the status word) near a matching F4 A0 (to write out the modified one).

% xd gopher.lit | grep -A2 'f2.a0'
00000690 4d 21 22 01 ce 9d f2 a0 07 20 40 00 10 00 ce 9d
000006a0 00 2c f4 a0 fc d7 00 00 00 80 7c 2b 00 00 00 10
000006b0 a8 84 7c 2b 00 00 00 04 a4 84 7c 2b 01 00 00 00

Jackpot (boldface added for clarity). This only occurred once in the entire file. Byte-swapped and disassembled into standard mnemonics, the relevant instructions are

suba.l d6,d6
.short 0xa0f2
move.l d7,d0
ori.w #0x10,d0
suba.l d6,d6
move.l d0,d6
.short 0xa0f4

The program was already manipulating the status word to turn on lower case input (with T$ILC), since most fingerds on the other end wouldn't be expecting uppercase. We can see what we have to do: if we add 0x08 to the 0x10, we get that extra bit turned on at the same time. I made this in-place change and now the AlphaBASIC shell could see and parse the tabs!

However, files when downloaded were not checksumming. Indeed, this fixed the tab problem but not the CRCRLF issue, and there was an extra CRLF at the end. There is no status bit to disable that; it turns out that turning LF into CRLF is intrinsic to the standard output routines, so it would faithfully turn any CRLF already being received into CRCRLF, and the output redirection added the extra line at the end as a final insult. Since we are using a driver script and emitting to a file anyway, the solution here was simply to write a postprocessor in assembly (AlphaBASIC would be unacceptably slow for this, and isn't very adept at binary file handling) that the driver script calls afterwards to get rid of the extra CRs and remove the extraneous line at the end. I chose to do this as separate utilities for each individual task ("one thing well") and they might actually be useful utilities in their own right in any case. At last: every file I downloaded in this fashion checksummed correctly!

With the rudiments now properly functioning, it was time to finish the front end. A couple feverish days of coding later, drum roll please:

This is GAMBLE.BAS, the Gopher Alpha Micro Browsing-Linking Environment, displaying Floodgap's root gopher menu on the Eagle 300 with its beautiful colour AM-75 console (a modified Wyse WY-370). Gopher works very well from the keyboard and because everything is stored as a file, GAMBLE can display arbitrarily large menus and text documents limited only by disk space. Another nice side effect is that saving binaries and other resources is merely a matter of renaming the temporary file. If you have an Alpha Micro with AMOS 2.3A and AlphaTCP 1.3C or 1.5A you'd like to run this on, source code and binaries are available on the Floodgap gopher server (or via the Public Proxy).

As an enticement, I have uploaded the old Alpha Micro Users' Society network library archive to the Gopher server as well, and GAMBLE can access it directly. Check it out!

I should note that the approach I have chosen here is not without its drawbacks. AMOS has a maximum command line length of about 92 characters, and since .CMD files drive the command line in the same fashion, they have the same limitation. This means that with all the other pieces of the command, the combined length of the hostname and Gopher selector is limited to about 80 characters making certain portions of some sites inaccessible (the front end checks this, and won't let you select them as a precaution). In addition, a selector with quote marks or @ characters would similarly be inaccessible because these are reserved by the shell and (ultimately) FINGER.LIT, though these characters are not commonly found in Gopher selectors, and of course the dependence on the port number being in TCP:SERVIC. means it will only ever communicate over port 70. A TAMED (using the socket API) approach in AlphaBASIC would avoid the selector and port limitations but it would impose others, notably speed as well as handling binary data and possibly memory capacity, and it wouldn't run on AlphaTCP 1.3C. A lot "just works" this way on the widest variety of systems, and I might think of better ways to tackle these deficiencies over time.

Most of all, though, this entire endeavour probably taught me more about AMOS than all my puttering around with it had as a toy over the last decade or so. For one of my favourite and certainly more obscure machines, this was a stimulating and actually vaguely useful project, and I'll be thinking about other things I could be doing with it now that my appetite is whetted. Meanwhile, this isn't the last gopher client we'll be writing in this blog: stay tuned for some other machines that I really need to be tunnelling through Gopherspace with.

No comments:

Post a Comment