Friday, July 15, 2022

Crypto Ancienne 2.0 now brings TLS 1.3 to the Internet of Old Things (except BeOS)

Who says you can't teach an old box new tricks? We did it before and we're doing it again. Crypto Ancienne ("Cryanc") is a TLS implementation for pre-C99 beasts and monstrosities featuring carl, a simple curl-like utility that serves as a demonstration command line tool and even as an HTTPS-over-HTTP proxy for suitably configurable browsers. Many operating systems are supported and a number of compilers too (not only gcc going back to version 2.5 and the egcs days, but also clang, MIPSpro, Compaq C and even Metrowerks CodeWarrior). Now, after a lot of late night hacking, screaming and unspeakable acts of programming, tons of bugs are fixed (including a long-standing big-endian issue with ChaCha20Poly1305) and the core has been significantly upgraded such that almost all of the supported platforms now support TLS 1.3.

And what are those supported platforms? Why, here's some of them as they were being cruelly whipped to perform like beaten dogs for your entertainment:

They'll avenge themselves on me eventually, but until then they'll encrypt their HTTP and they'll like it. The list of shame includes AIX (4 and 6+), SunOS 4 (via OS/MP on Solbourne), Mac OS 9 (via Power MachTen), A/UX 3.1, IRIX (6.5 and possibly earlier), Rhapsody/Mac OS X Server v1.2, Mac OS X (PowerPC and Intel), NeXTSTEP (on HP PA-RISC) and Tru64 (on Alpha), plus more pedestrian choices like Linux and NetBSD on any platform I could find in the house (I develop on POWER9) and modern macOS (on Intel and Apple silicon). Contributors have added support for HP-UX, Haiku and Solaris, and there is partial support for BeOS R5 (on PowerPC, at least), which I'll talk about at length in a moment.

As we demonstrated previously, carl can act as a HTTPS-over-HTTP proxy for those browsers that are suitably configurable, such as Classilla 9.3.4b on Mac OS 9, allowing them to self-host their own encryption. So here's proof of crypto (chosen because it doesn't require JavaScript, ahem, Qualys), with Classilla tunneled through Power MachTen on Mac OS 9 all running on the same MDD G4:

And here's OmniWeb on my Wallstreet G3 running Rhapsody/Mac OS X Server 1.2, also showing proof of self-hosted crypto via carl as proxy:
And in the why not category, here's the classic NCSA Mosaic 2.7b5 (no tricks! this is an off-the-shelf build) on my SunOS 4.1-equivalent Solbourne S3000 with an SBus bwtwo display, showing it too has joined the TLS 1.3 party (I'm using Akamai's TLS 1.3 tester here because it's more tolerant of how slow this machine is):
Man, I miss OpenWindows. And, well, I'm just getting started! Here's A/UX 3.1.1 on a clock-chipped Quadra 800, running MacLynx also showing on-board TLS 1.3:
And here's NeXTSTEP 3.3 on PA-RISC, an architecture I fondly regard as working on a HP-UX K250 was my first job out of college, running OmniWeb 2.7 and hey presto crypto on-board TLS 1.3:
And even though the classic BeOS isn't at the TLS 1.3 party for various technical reasons to be explained, it's still at the TLS 1.2 kiddie table with all the other bug fixes in this release. While I think about how to hack NetPositive, here's BeOS R5 on a real 133MHz BeBox successfully downloading the BeOS 5.0.3 update from Github:
Who needs Haiku, right? Though it's supported too.

All of the supported platforms must pass my internal test suite using real websites with known variations in TLS support and server fussiness, and be able to complete a full transaction with all of them reliably (modulo timeouts on slow machines). If Crypto Ancienne can build on your platform, and it can build on a great many platforms already, by this point in its evolution it is very likely to "just work." You don't need anything other than a C compiler; in fact, you don't even need make (it's a huge, statically linked, single compile).

Development for this release has actually been in gestation for quite awhile. The site I originally couldn't make happy with the then-existing TLS 1.3 support was Github — it would keep breaking the transmission in the middle. For months I had this put aside, spinning my wheels intermittently when I tried to think of where the problem lay. In desperation I did a few weird experiments in carl like turning the main select() loop inside out or running a second read loop inside of it, and after some stumbling around I could get a full read consistently.

The site that then started breaking in 1.3 was freaking Thanks, administrators, um, for making my code better! Yeah! And drop dead! The original problem was that it insisted on RSA-PSS-RSAE even when I didn't offer it in the client hello as an acceptable signature algorithm. TLSe (the crypto library of which Cryanc is essentially a hard fork) didn't understand or expect this, so I had to add that support, and libtomcrypt apparently doesn't know how to handle a zero salt length either (this would indicate you need to compute it yourself), so I had to add that too. It took a couple days poring over wire dumps to figure out what was actually going on, especially because of all the changing nonces and values.

But even after this was working, via carl was still broken on big-endian because it would complain there were no common ciphers if I didn't offer CHACHA20-POLY1305-SHA256 (haven't you guys heard of AES-256-GCM-SHA384??). After labouriously vetting the third-party implementations I use, I found the endian issue in connecting glue code and was able to make it work. Now everything passed, on both my little-endian Linux POWER9 and my big-endian AIX POWER6. I grant this is a blatant violation of "never roll your own crypto" but anyone using Cryanc in a production environment is stupid and should meditate deeply upon the other bad choices they've made in their lives.

The next two platforms I tested on were BeOS, on my 133MHz BeBox (PowerPC 603), and SunOS 4, via OS/MP on my 36MHz Solbourne S3000 (SPARC KAP). I consider these machines to be my "problem children": the BeBox, because the classic BeOS can be very weird and even more so on PowerPC, and the Solbourne, because SPARC can be nearly as alignment-finicky as the DEC Alpha and it is probably the lowest spec system that is just barely practical with Cryanc. TLSe has some very clever ways of monkeypatching directly into its own data structures, "clever" being both a blessing and a curse in this case, and this sort of monkeypatching is indeed exactly what upsets many alignment-sensitive old RISC architectures. (Parenthetically, this is something that doesn't show up on PowerPC and Power ISA, my standard development platforms, because PowerPC handles most misaligned scalar loads and stores in hardware. Another reason I'm a pro-PowerPC bigot.) For SPARC and other very fussy old RISC architectures like SGI MIPS we have NO_FUNNY_ALIGNMENT, a special mode that manually breaks apart these accesses at the cost of slightly slower performance. This was slow because rebuilding carl after each change even with a CPU+L2 cache from an S4100 took close to 15 minutes for an unoptimized build, but debugging it was easy in dbx because it stopped right at the scene of the foul, and the actual work required to get it functional was only tedious, not complicated.

The BeBox was another story. Somewhere in the long interregnum between 1.5 and 2.0 the BeOS port started to rot, probably due to new codepaths being involved as server configurations changed. (An interesting example of the latter, in this case unrelated to BeOS, is that apparently some installations *cough* again*cough* of nginx currently throw an HTTP 500 error if you don't give it a user agent. So now we provide a trivial one.) There are two major memory limitations in PowerPC BeOS, though one is actually a compiler limitation: Metrowerks cc (essentially a command-line CodeWarrior) on PowerPC BeOS limits stack frames to 32K per function. There is no choice about using Metrowerks as your compiler because PowerPC BeOS executables are Preferred Executable Format binaries — yes, the same format as Code Fragment Manager executables on classic Mac OS. Other than the brief and sorrowful run of gcc under MPW, no open-source compiler of the day generated PEF on the Mac or anywhere else (the cool kids today use Retro68, which has its own set of PEF tools, but that wasn't a thing back then), so there was no gcc option like that which subsequently emerged for Intel and regular ELF binaries. [Postscript: turns out Fred Fish ported egcs to PowerPC, though the C++ name mangling is incompatible and it was never widely used or endorsed officially by Be. Still, it appears to work for C programs. Thanks, Cian, for sending me a copy; more on this in a future post.] We get around that with a define BIG_STRING_SIZE and cut this down to get stack frames to fit.

That simple adjustment was sufficient to get 1.5 compiling, and at least for awhile working, but only if you didn't optimize too much (you were limited to -O2 with 1.5; mwcc, however, can go up to -O7 and spinalcc can go up to -O11). That should have been a sign to me to look for other potential problems, and by the time I started trying to get 2.0 up on the BeBox everything was a mess. Things timed out, transactions would appear to abruptly cut out in the middle and occasionally it would outright crash with a null pointer. When trying to debug it, I discovered that even the fwrite calls to emit the data to standard output would just plain quit working, even though there was data arriving to spew, and sometimes carl would just unexpectedly terminate.

The problem turned out to be things getting stomped on in the stack, which wrecked return addresses and variables, bringing us to the other memory limitation. Up until around R4 BeOS had a miserable 256K stack limit total for every thread in a team (read as "process" for people unfamiliar with BeOS terminology). By R4.5 this had expanded to 64MB, 2MB of which was allocated for the main thread and then the rest divvied up. Addons and libraries run in your address space and their allocations count against your heap and stack usage. Be claimed that in R5, "the main thread will have 16 megabytes of room, which is needed for demanding applications like gcc" — but gcc could deal with the low stack capacity, so it's not clear that it ever used the extra space, and by the time OpenBeOS (the ancestor of Haiku) emerged post-R5 the memory map remained unchanged anyway. These limits apply to both Intel and PowerPC BeOS.

Threads being threads, there is no protection between individual thread stacks within that 64MB range and they are free to stomp on each other. Cryanc itself isn't multithreaded but it's possible and even likely other components loaded into its address space will be. If the main thread silently goes over 2MB of stack at any time, then other things can munge the overhanging data since they don't expect anything critical to be there. Cutting stack usage even more by trimming buffers got it further for some sites, but was not enough to pass the test suite all the other ports were easily passing, and entirely eliminating other things by moving them into the heap wasn't enough either. What finally got it to behave was a combination of those pared-back buffers; no optimization at all, to prevent CodeWarrior from combining or inlining functions that could bloat stack frames; changing to less-sensitive library routines that could handle a little corruption (no fwrite; using write with single characters in a loop and integer file descriptors instead of FILEs so as much stayed in registers as possible); and finally, and sadly, regressing to TLS 1.2 — it looks like the additional code for TLS 1.3 just upsets the apple cart too much. This makes it much slower than it should be, and some transactions will still time out, but it does work and does pass tests now. I don't know if this affects Intel BeOS and no one has ever sent me patches for it. Fortunately it does not affect Haiku, which just builds as any other POSIXy thing (though Haiku has perfectly cromulent crypto already).

With the two problem children put to bed, the next ports were A/UX 3.1 (my clock-chipped Quadra 800), IRIX (R4400SC Indy and 900MHz R16000 Fuel with V12 DCD), AIX 4 (Apple Network Server 500) and the Mach family (Mac OS X on Intel and PowerPC, macOS on Apple silicon, Rhapsody/Mac OS X Server, Power MachTen and NeXTSTEP on PA-RISC with my SAIC Galaxy 1100), which after all that largely "just worked." Tru64 is the only one that didn't get a workout because my Alpha 164LX decided to eat its network card and a replacement is still on order, but I don't foresee any problems now that the others work (note that because Alpha, you still have to pass -misalign to the Compaq C compiler; NO_FUNNY_ALIGNMENT right now assumes big-endian, and while it works for SPARC, SGI-MIPS and others it doesn't cover all the cases Alpha seems to get stuck on). And that concludes our ports!

There aren't performance improvements here and some machines might be slower, though having ChaCha20Poly1305 now available on big-endian probably helps ease the hit. You'll notice that I've indicated proxy mode in the screenshots with -pt to disable timeouts (instead of just -p), though some sites will still timeout anyway on systems below 40MHz or so. The biggest overhead seems to be key exchange and that might be a point of further optimization, but some calculations simply can't be avoided completely.

In the future I'd like to figure out ways to expand the compatible browsers list (right now, this is primarily OmniWeb through 4.0.6, Classilla 9.3.4b, MacLynx, and UNIX NCSA Mosaic through 2.7b5 or Mosaic-CK; send in others you can get working). In particular, there must be a way we can hack NetPositive to send HTTPS requests to an HTTP proxy instead of using CONNECT. For 2.1, I think we can finally get a true native classic Mac OS port using the old gcc on MPW to compile it as an MPW tool and using ToolDaemon as an inetd equivalent to serve it, and I'd like to get it working on HP-UX on 68K at the same time. Then, for the big 3.0, it's time to dive into certificate validation and ECDSA and get that fixed (but it will always be optional, since the overhead is already straining these older eldrich beasts). And the people demand a VMS port! Gotta check the license for the C compiler on my VAXstation ...

You can download the source code on Github. The Floodgap Gopher server has precompiled binaries for SunOS 4.1 and OS/MP, Rhapsody/Mac OS X Server v1.2 and Power MachTen 4.1.4, and PowerPC BeOS R5. For the rest, see our easy-to-follow build instructions.

No comments:

Post a Comment

Comments are subject to moderation. Be nice.