Friday, May 19, 2023

The KIM-1 that sounds like Stephen Hawking (or: "jitbanging" DECtalk)

My 1976 briefcase Commodore/MOS KIM-1, a 1 MHz single-board computer with a 6502 CPU and 1K of RAM, has learned to talk — with a familiar-sounding voice.

The KIM-1's serial lines are connected to the last and smallest member of Digital Equipment Corporation's true DECtalk hardware speech synthesizers, the 1994 DECtalk Express. The DECtalk's classic default voice heard in this video is Perfect Paul, which (with adjustments) was the voice of Dr Stephen Hawking as produced with the 1988 Speech Plus CallText 5010.

The 15 keys we can read off the KIM's hexadecimal keypad are polled by a "talker" program that sends the DECtalk Express words and phrases to speak. However, although the KIM-1 has 20mA current loop output you can turn into RS-232 serial, its built-in ROM routines can't reliably communicate at the 9600 baud rate the DECtalk Express demands.

So, in today's entry, we have a veritable smorgasboard of geriatric geekery: using our KIM-1 serial uploader to push a program for execution, let's write a bitbanged 9600 baud serial transmitter routine in 6502 assembly and let the KIM-1 have its say — and crack the DECtalk Express open and look at the insides while we're at it. (Teaser: you'll find its CPU very familiar.)

But first, a minor review of asynchronous serial communications.

This diagram is an idealized representation of how a byte is sent under the most common "8N1" scheme, i.e., eight data bits, no parity bit and one stop bit. (Note that the grey mark, space and clock lines are aligned on the black lines; I just offset them slightly for visibility.)

The normal state of the serial line between bytes is high (mark). The receiver runs at some multiple of the selected baud rate, often eight or sixteen times on modern UART chips, watching for the line to go low (space) to indicate the start of a new byte. Eight bits follow clocked out at the agreed rate, from least significant to most significant; the receiver samples the line at intervals of the baud rate starting from when it first saw the start bit. Transmission terminates with the line returning to high for at least one bit's time, though it can remain high for an arbitrarily long period after that. Fortunately the protocol is generally self-synchronizing as long as your stop bits are long enough, and its 80 percent efficiency is perfectly cromulent.

Typically this process is handled for you by a UART or ACIA chip, where you just program the rate and parameters, and grab and send whole bytes as necessary (the chip is often rigged to trigger an interrupt when data is available). Even on systems that notionally push bits over the wire in software, there is usually some degree of hardware support. A good example is the Commodore 64 where the Kernal emulates a 6551 ACIA in software, but the bitrates are wrong, and it is mostly useful only for low-speed transmission. Still, however, the 64's 6526 CIA I/O chips are more than capable of clocking bits in and out at even relatively fast rates and provide interrupts for prompt servicing, so with a little external circuitry the CIAs can easily communicate at up to 9600bps on an unmodified system.

But the KIM-1's TTY lines are bitbanging at their most metal. The KIM-1 has two I/O chips, the 6530 RRIOTs (RAM, ROM, I/O and timers), which also provide the KIM's two 1K system ROMs and an additional 64 bytes of RAM each. One of the RRIOTs handles the keypad, LEDs, cassette recorder and the TTY receive and transmit lines. The RRIOTs can generate interrupts from their timers but not from their GPIO lines (the later 6532 ROM-less RIOT can), meaning the KIM-1 must manually poll the receive line for input and manually drive the transmit line for output. Although hardware flow control lines could presumably be fashioned out of the other GPIO lines, the KIM monitor wouldn't know how to use them. Similarly, while the interval timer's interrupts could be used to signal when the next bit is due, the RRIOT won't automatically send or receive it, and any bits the 6502 CPU doesn't check for it won't get. As a consequence of its simplified implementation the KIM's TTY support is also limited to half-duplex, i.e., only one side can talk at a time.

That said, you can do a lot with the KIM's onboard serial support (here shown is my original briefcase unit with Bob Applegate's I/O board), but those routines won't work at modern speeds: although the KIM-1 can automatically detect the speed of the connected terminal when the RUBOUT (i.e., character 0x7f) key is pressed, the system's clock speed and the relative inefficiency of its serial code hobble the possible throughput. Consider the single serial bit delay routine at $1ed4, critical for proper bit timing:

; 6 cycles sunk on jsr
delay  lda cnth30 ; 3 cycles
       sta timh   ; 3 cycles
       lda cntl30 ; 3 cycles
de2    sec        ; 2
de4    sbc #$01   ; 2
       bcs de3    ; 3 if taken, 2 if not
       dec timh   ; 6
de3    ldy timh   ; 3
       bpl de2    ; 3 if taken, 2 if not
       rts        ; 6

The minimum time this routine can run at is 6 (from the act of calling it as a subroutine) + 3 + 3 + 3 + 2 + 2 + 3 + 3 + 2 + 6 = 33 processor cycles if the KIM's detected baud rate countdown value (in cntl30 and cnth30) has encoded the smallest delay possible. The number of cycles needed for each bit of a given transmission is clock speed divided by baud rate. Solving for baud rate at the KIM's clock speed of one million CPU cycles per second, even if the KIM did nothing else but call this delay subroutine and loop forever (best case 36 cycles for a guaranteed branch always), it could proceed no faster than 27777.78 baud.

As a practical matter, though, the KIM is doing much more than just cycling a delay counter, and the KIM also provides local echo to the terminal necessarily at the same speed. Consider a modest 1200 baud transmission: at that speed we would have 1_000_000/1_200 or 833.33 cycles between each bit. Additionally, to know when the start bit arrives, the KIM will have to poll the receive line at least that fast (though as a practical matter at several times that rate) or it will misread the entire subsequent byte. In interactive usage this is usually (barely) okay but in bulk papertape uploads the KIM is not fast enough to validate, store and echo back at 1200 baud the data it's receiving, exacerbated further by the fact the connection is half-duplex.

For this reason the KIM is really a 300 baud system, typical for 1975 when it was designed, and any reliable brief spurts above that are purely providential. This was no problem in the KIM's heyday since most TTYs were printer teletypes and themselves weren't too fast, like the Silent 700 Model 763 I use with this one.

That brings us to the DECtalk Express.

The DECtalk Express was the last and smallest "true" DECtalk produced by Digital Equipment Corporation, announced in 1994 at the eye-watering price of $1195 (about $2370 in 2023 dollars). Intended for visually impaired users, as the Braille on the case would indicate, it is a handsome, reasonably durable and highly portable unit measuring about 7.5" by 3.5" (195mm by 95mm) and weighing a little under a pound (less than half a kilogramme). DECtalk's roots were in the MITalk project from the Massachusetts Institute of Technology, but its characteristic voice was almost entirely the product of professor Dennis Klatt, who authored over sixty papers in his lifetime on the subject of speech synthesis. (Here's where I get giddy, since my undergraduate degree was in general linguistics.)

MITalk emerged from an early project at the Cognitive Information Processing Group, which was then focused on developing sensory aids for the blind, and determined a reading machine that could scan and speak text was a worthy goal. A prototype was demonstrated in 1968, able to recognize characters and pass them to a text-to-speech algorithm that either spoke the words as morphs it could recognize from its dictionary or spelled them out as letters. (A similar prototype at Stanford, the Optacon, was demonstrated in 1969.) Although the unit's potential was well-received, its vocabulary was wanting, and From Text to Speech politely observed that "the output speech quality required extensive learning."

The first part of improving the speech quality was getting better morphs out of input words, a process called morph decomposition; the second was developing a more comprehensive dictionary of morphs to phonemes (sounds) to eliminate falling back to letter spelling as much as possible. Analysis of phrases and parts of speech was necessary to accurately determine how a word should be spoken in context. For those words that managed to defy decomposition or those morphs absent from the lexicon, fallback letter-to-sound rules filled in the gap as much as possible (as the researchers observed, many high frequency English words have pronunciations difficult to fully predict from their spelling, so substantial effort was made to ensure they never hit the fallback rules). This work was led by Jonathan Allen's Natural Language Programming Group and accomplished on a DEC PDP-9 with 24KW of memory and DECtape, mostly in BCPL.

The final part was actually rendering the phonemes to audio, which was where Klatt's work in the Speech Communications Group came in: he personally developed a working computational model of his own voice in Fortran, supplemented by additional researchers who contributed to its prosody algorithm (what we call suprasegmentals, attributes like intonation, stress and rhythm) and fundamental frequency (pitch) generation. Klatt's work primarily involved MITalk's formant synthesis, viz., generating the bag of frequencies the human ear detects as a particular sound, as well as how nearby sounds affect them and the durations of individual sounds. Voice parameters could include (but were by no means limited to) specific formant frequencies, average pitch and pitch range, simulated breath and head size, and the amplitudes of the internal source noise generators for voicing, fricatives and aspiration. What became the "Perfect Paul" voice is, at least in part, how Klatt himself spoke.

In the late 1970s MITalk development on the old 18-bit PDP-9 was retired for a 36-bit DECSYSTEM-20. A specialized interface was wired between the DECSYSTEM-20's KL-10 CPU and a PDP-11 which generated the audio waveforms using a digital hardware module fast enough to run in real-time, at a sample rate of about 10kHz. By 1979, the DECSYSTEM-20 version was judged sufficiently complete to be made available for licensing and was subsequently translated to Pascal and C. A sample session with the Un*x port of MITalk, out of From Text to Speech, looked like this (page 176):

MITALK: System configuration:

                  tty
                   |
                 FORMAT
                 DECOMP
            tty<---|
                 PARSER
            tty<---|
                 SOUND1
                   |
                  tty
                  
MITALK: Starting system...
MITALK: System running
MITALK: Please enter text (type ^D [control-D] to exit)

The old man sat in a rocker.
^D
 DECOMP: THE (ARTICLE) => THE
 DECOMP: OLD (ADJECTIVE, NOUN) => OLD
 DECOMP: MAN (NOUN, VERB) => MAN
 DECOMP: SAT (VERB, PAST PARTICIPLE) => SAT
 DECOMP: IN (PREPOSITION, ADVERB) => IN
 DECOMP: A (ARTICLE) => A
  PARSER: NOUN GROUP: THE OLD MAN
  PARSER: VERB GROUP: SAT
 DECOMP: ROCKER (NOUN) => ROCK+ER
 DECOMP: . (END PUNCTUATION MARK)
 DECOMP: . (END PUNCTUATION MARK)
 DECOMP: <EOF>
  PARSER: PREPOSITIONAL PHRASE: IN A ROCKER
  PARSER: UNCLASSIFIED: .
  PARSER: UNCLASSIFIED: .
  PARSER: <EOF>
   SOUND1: DH 'AH
   SOUND1: 'OW LL DD
   SOUND1: MM 'AE NN
   SOUND1: SS 'AE TT
   SOUND1: 'IH NN
   SOUND1: AX
   SOUND1: RR 'AA KK * - ER
   SOUND1: .
   SOUND1: .
   SOUND1: <EOF>
   
MITALK: System done

Given the provided input, the decomposer identifies words and affixes and hands them off to the parser, which attempts to turn them into phrases (note to syntacticians that what it calls a "group" is not necessarily equivalent with an NP or VP). The parser then feeds the phrases into the sound generator, which word by word emits phonemes; these in turn would be fed into the actual speech synthesizer which this example did not illustrate. The system was designed to be highly modular. I remember writing a parts of speech analyzer in Perl 4 for my computational linguistics course completely ignorant at the time of all this prior art, and it turned out to be a more primitive version of nearly the same algorithm MITalk used years prior. I got an A anyway.

One of the MITalk licensees, Telesensory (a spinoff established in 1970 after the Optacon in 1969), created prototype versions of the speech synthesizer in hardware in 1980. This became an actual product, the 1982 Prose 2000 card, selling for a whopping $3500 (about $11,000 in 2023 dollars); it had its own Intel 8086 CPU and 80K of ROM, more powerful than many standalone computers of the time, and substantially better sounding than earlier systems like the Votrax Type-N-Talk — though the Votrax was a lot cheaper. The PR2020 was a standalone Prose 2000 in a box with a serial interface. Telesensory sold their speech unit and the hardware designs later that year to startup Speech Plus, Inc., founded by James Bliss who had led the team that designed the Optacon at Stanford. Hawking had a Speech Plus Prose 2000 before upgrading to the later CallText 5010, which continued to use the same MITalk-derived algorithm with Klatt's voice technology.

In 1982 Klatt developed his own speech system called KlattTalk, incorporating both his prior synthesis work with a new text-to-speech algorithm, and adding additional voices based on his family. (Klatt eventually lost his own voice to thyroid cancer in the early 1980s and died at the age of 50 in 1988.) DEC, typical of its management at the time, was a late entry into the then-hot speech market and bought an exclusive license to KlattTalk to get something on the shelf quickly.

Such was the first DECtalk (designated DTC01), a $4000 ($11,600 in 2023) sprawling 16-pound desktop unit announced in December 1983. It used a Motorola 68000 for the main CPU with a Texas Instruments TMS32010 DSP, both running at 20MHz, and emitted audio through an AD7541 DAC. Like the PR2020, it connected to any computer using a serial port, and in addition to an internal speaker in its VT240-derived case, it also had jacks for connecting a telephone line and audio output. The DTC01 spoke English sentences or raw phoneme data sent to it via the serial ports in one of eight voices with adjustable parameters (pitch, speed, etc.); the original voice complement was Perfect Paul (still the default voice), Beautiful Betty, Frail Frank, Huge Harry, Kit the Kid, Rough Rita, Uppity Ursula and the user-defined Variable Val (I think I dated this person).

DEC's marketing envisioned the DECtalk in multiple settings, primarily remote messaging (electronic mail "read" to you, banks, process control systems), with consumer sales for the speech- or visually-impaired mentioned as more of an afterthought. On-board software could respond autonomously to touch tones, allowing users to dial the DECtalk directly and receive voice messages from a connected computer or even send data back to that computer via the telephone keypad, though the computer had to be programmed to respond appropriately using various ESCape sequences over the serial link.

The underside of the DECtalk Express, with its regulatory information and (on this, my mint unit) Velcro attachment strips and rubber feet. Although the DECtalk Express is designated DTC08, there are only seven models in the series. The successor to the original DTC01 DECTalk was the 1985 DTC03 DECTalk III (there wasn't a DECTalk II or DTC02 publicly released that I can determine), a card intended to be placed in a standalone enclosure and connected over serial with support for up to eight telephone lines, clearly intended for call centers and corporate environments. This card introduced upgraded firmware used for the remainder of the line, with an improved letter-to-phoneme algorithm and two more voices named Whispering Wendy and Doctor Dennis, and was based on an entirely new yet totally familiar architecture which I'll grandiosely reveal in a moment. The later DTC06 upgrade card could make the original DTC01 DECTalk into the equivalent of a DTC03.
The largest DECtalk descendant was the 1989 DECvoice (DTC04 and the multiline version, in two variations for phone [DTC05] and T1/DS1 [DTCN5]), ostensibly "just" a Q-bus card for the MicroVAX series, but could be also shipped inside an entire specially badged MicroVAX II system supporting up to sixteen telephone lines. This was an otherwise ordinary MicroVAX running VMS but with the DECvoice VOX software kit, which is no longer available from HPE. The DECvoice Builder package supported development of speech applications, sold separately, of course. DECvoice's biggest advance over DECtalk was that it not only generated speech but could recognize it too.

Although DEC's early strategy was clearly to corner the corporate market, the last two DECtalks were intended for personal computers. Both of them eliminated the ESCape sequence support in the bigger DECtalk products and served strictly as talkboxen, though Klatt's multiple voices and adjustable parameters remained. In 1992 DEC's Assistive Technology Group announced the DTC07 DECtalk-PC, a full-length ISA card with an included external speaker, positioned at the visually impaired market for use with third-party screen readers. Even more so than ISA cards are usually, the board is completely dependent on the software, requiring the PC to upload its sound data before it will speak a single word (and there are various revisions of the card that will only work with a matching revision of the DOS driver); to receive speech commands the DOS TSR creates a pseudo-COM port and LPT port which can be "printed" to or communicated with. Everything in the package was marked in Braille. It, too, had an MSRP of $1195.

And thus we return to the last of the O.G. DECtalk line, the 1994 DECtalk Express, bringing us back full circle as it was a standalone serial speech box like the original DECtalk. This is my "mint" version I bought NOS from a liquidator (the "beater" unit is the device the KIM-1 was using) and came with all the original accessories except the headphones; it was not sold by DEC, but I'll talk about that at the end when we finish the history. With the unit came a looseleaf manual (with a Braille inventory and getting started card), the manual on tape, a 3.5" floppy with the DOS TSR software (same functionality as the DECtalk-PC), a nylon carrying case and accessories pouch, a serial port connector and cable, and a wallwart (12V 800mA positive tip barrel jack).
When placed in its carrying case — notice the "DECtalk" logo in the "digital" style — the speaker has its own soft grille and there is an aperture for the power jack (the other ports are available by opening the case). The Express was the only DECtalk that was portable, containing a rechargeable NiCad battery which is by now almost certainly non-functional, and indeed was in both of my units. More shortly when we crack it open.
The need for the serial connector and cable is because Digital was still using their irritating Modified Modular Jack (MMJ) 6P6C connector for RS-423 communications, the jack that looks like a cockeyed RJ-11 phone connector. Also shown here is the volume thumbwheel and the headphone jack, which cuts out the internal speaker when something is plugged in.

Like the DECtalk-PC the provided DOS TSR provided a pseudo-COM port for the DOS COPY command, but we can just talk to the Express directly, so the unit is fully functional without the driver. Unfortunately, although the DTC01 could communicate at a variety of baud rates all the way down to very slow speeds, the DTC07 and DTC08 are 9600 baud or bust, so set your terminal program to 9600 baud 8N1 with software (not hardware) flow control, and connect the DE-9 to your serial port. When we connect it to our Raptor POWER9 workstation and turn it on, this message pops up in minicom:

DECtalk Express Speech Synthesizer. (11/08/96 13:48:14)
1..  .  ROM checksum: 48
Copyright (C)1993-1996 Digital Equipment Corporation - All Rights Reserved * 
 ss:esp 00000010:000EFFB0
{Go}

and then, after about 15 seconds, the text [DECtalk Express is running.] appears and the unit says, "DECtalk Express is running. External power on."

Does those register names (ss:esp) look familiar? They should! Let's get it open.

There are four small lag bolts holding the unit together. With the back off the first thing that shows up is the battery tray, which I pulled out years ago, so here we've just slid the tray aside (top right). This battery is a 7.2V 650mAh NiCad rechargeable wired to the red-orange connector in the lower left corner of this image; the positive pin is the one closest to the MMJ connector and the negative pin is the one furthest away (the pin in the middle is not connected). If you end up with a dead Express, you should start off by removing the battery because a ruined battery will prevent the unit from starting and the device is perfectly functional without it.

As for the hardware, your eyes will have already been attracted to the chip with a ... Microsoft Windows logo. The CPU is that very BQFP, a 25MHz 80386 processor with a datecode of 30th week 1995, the AMD Am386SXLV-25 (I told you it would be familiar!). The DTC03 DECtalk III converted the line to the Intel 80186 from the Motorola 68000, using the same 20MHz crystal, and an 80186 powered every DECtalk device up to and including the DECtalk-PC. The Express was the only unit to use a '386, and the most computationally powerful of the DECtalks. The Am386SXLV-25 is a low-power version of the Am386SX that can run on as little as three volts and have its clock completely halted without losing register contents. In addition, the SXLV variant uniquely has a special high-priority interrupt that can be used for further peripheral power management, though I can't tell if this feature is actually used in the Express.

South of the CPU are two 256Kx16 bit DRAMs configured as 1MB of memory and below that are two ROMs in PLCC sockets. To the right of the DRAMs is the main flash memory (two 1Mbit flash modules to equal 256K), and to the left of the DRAMs is the 16C550 serial UART. Directly above the UART is the DSP, still in the Texas Instruments TMS320 family, but here a TMS320C25FNL-J at 20MHz.

The reverse side just has some discrete surface mount components but no ICs.

The Express takes the same inline DECtalk command sequences as every other DECtalk system, just not the extended ESCape sequences of the bigger DECtalken. These commands even include sending DTMF touch tones ([:dial 1-900-976-4352]) or making generic tones of a particular frequency ([:tone 440,1000]) as well as changing and modifying voice characteristics. For example, consider this audio clip, recorded literally with the microphone pointed at the speaker:

This clip goes through nine of the built-in voices, though since Variable Val isn't configured on this system, (s)he sounds mostly like Paul. The clip was generated with this text, where the [:nXX] command sequence selects a voice by letter:

[:np] Perfect Paul!
[:nb] Beautiful Betty!
[:nd] Doctor Dennis!
[:nf] Frail Frank!
[:nh] Huge Harry!
[:nk] Kit the Kid!
[:nr] Rough Rita!
[:nu] Uppity Ursula!
[:nv] And unconfigured Variable Val!
[:nb] Form legs and body!
[:nh] And I'll, form: the head!
[:np] DECtalk Tron!

All these strings have punctuation, which — look back at the MITalk example — are required not only as delimiters but also to tell the DECtalk a complete utterance or subphrase is ready to be spoken. Things like commas and colons are also understood. The Express does not echo the text back to the serial port, and any errors (say, a command like [:nc) are reported in audio, not in text. Other than the fixed banner message there is nothing to receive from the unit.

That's all we need to know on the DECtalk Express side, so now let's go back to the KIM side. Since the Express will only accept data at 9600 baud, we'll have to write our own routine to transmit it. Happily, our task is considerably eased by only needing to write the transmit side as there's nothing to read. Transmitting is substantially easier because if we're not fast enough between bytes, we'll only lose time, not data. In fact, having a lengthened stop bit period makes it easier for the receiver to resynchronize if necessary, turning necessity into virtue.

No KIM-1 emulator supports direct drive of the RRIOT's TTY transmit and receive lines (even my Incredible KIMplement emulator, which does support the Commodore 64's user port for serial I/O, uses the higher-level Kernal RS-232 routines), so to speed up the development process we'll want a tool to push code we cross-assemble (with xa65, naturally) to a live physical system. I've provided that tool, which I call KIMup, on Github under the Floodgap Free Software License. It runs the serial connection at 300 baud for maximum reliability with the KIM's built-in TTY routines, turns binaries into KIM paper tape format on the fly at the addresses you specify, and then pushes them to the unit automatically (and optionally starts execution as soon as the upload finishes). Because the KIM paper tape format encodes the absolute address for the data, we can send multiple segments in one transmission, so we'll divide our work into three modules: a small dictionary of phrases, the "talker" to take keypad keys and select from the dictionary, and the actual string sender to send the selected phrase at 9600 baud over the TTY.

Dividing the clock speed by the 9600 baud rate gives us 104.17 cycles per bit. Naturally we'll never be able to run at a fractional rate, but because the receiver samples the line only once at a regular cadence during each 104.17 cycle period, as a practical matter it doesn't hurt if we're a little faster sending bits as long as we're not a little slower (because then we'll start smearing bits together) and as long as we make up the difference after the stop bit. We could use the KIM's interval timers to count cycles and trigger an IRQ to send the next bit, but this is a little involved, and also isn't very applicable to other 6502-based systems. Instead, since we know exactly how long a given instruction will take to execute from the 6502's instruction set description, let's map out how we would send a bit by counting cycles.

        * = $0000

        cld
        sei
        lda #0
        sta $1741
        lda #$3f
        sta $1743
        lda #7
        sta $1742
        
        jsr send
        jmp *-3
        
send

COUNT=18
#define WAIT ldy #COUNT:dey:bne *-1
#define BIT0 lda $1742:and #$fe:sta $1742:WAIT
#define BIT1 lda $1742:ora #$01:sta $1742:WAIT

        ; start bit
        BIT0

;---------------------------------
        ; 1
        BIT1

        ; 2
        BIT0

        ; 4
        BIT0

        ; 8
        BIT0

;---------------------------------

        ; 16
        BIT0

        ; 32
        BIT0

        ; 64
        BIT1

        ; 128
        BIT0
;---------------------------------

        ; stop bit
        BIT1
        WAIT
        WAIT
        rts

This will send the letter A (ASCII 65) repeatedly down the wire at something close enough to 9600 baud. Assemble it with xa -o serial.o serial.xa (or as you named it), then kimup -g 0 0 serial.o to upload it to the KIM-1 and start execution immediately. Leaving the KIM connected to your computer's serial port, start your terminal program (such as minicom -b 9600 -w) and you can see the output. Press RS on the keypad to stop the program when you're done.

Sending each bit is a fixed number of cycles, here 4 + 2 + 4 = 10 cycles to load from the I/O register, twiddle the least significant bit connected to the transmit line, and store it back. The actual write to memory occurs on the 10th and final cycle. After the bit's value is set, we then inline a delay loop until it's time for the next bit. For the provided starting value of 18, we burn two cycles on loading the Y register, and then each turn of the loop up to the last cycle is 2 + 3 = 5 cycles (because the branch is taken; if this spanned a memory page it would be an additional cycle), with the last one being 2 + 2 = 4 cycles (because the branch is not). This comes out to 2 + (5 * 17) + 4 = 91 cycles, so with the ten cycles to set the bit, each bit here lasts 101 cycles. If we really wanted to be anal we could throw in a nop (two more cycles to total 103) or a useless read from zero page (three more cycles to total 104), but 101 is actually well within tolerance already and saves us a few bytes. That's about a three percent variation, well within the five percent baud rate mismatch most receivers will tolerate as a rule of thumb.

You'll notice that the stop bit pauses several bit periods for paranoia's sake in case the receiver started in the middle of a byte; this ensures we self-synchronize. Again, we do have to wait at least some period longer after the stop bit's period because we transmitted at a slightly higher rate than the receiver, and the very first 10 cycles we spent weren't actually part of any bit. The least amount of additional time you could sit there after the stop bit is sent and "waited" is ten times the delta between the ideal rate and the actual rate, plus another ten for the orphaned cycles prior to the start bit, which in this case totals (10*3.17) + 10 = 41.7 cycles. Thus, we could alternatively wait as little as 42 cycles after the stop bit period and gain a bit of throughput, but I'll leave that exercise to the appropriately cautious reader who notes it becomes less reliable without an unambiguous interval.

With a little bit of self-modifying code, this precisely timed simple example can send arbitrary characters. I hereby officially coin the term "jitbanging" — that is, patching or emitting a bitbanged routine Just In Time to handle arbitrary data, in this case each character. Since the JIT process occurs after the stop bit, we don't need the extra wait period because the JIT process to set up the next character will more than cover it. Here is a complete "jitbanged" example that sends a character sequence over and over.

        * = $0000

ch      = $fe
cch     = $1780

        lda #$41
        sta cch

        cld
        sei
        lda #0
        sta $1741
        lda #$3f
        sta $1743
        lda #7
        sta $1742

aaaa    lda cch
        and #$0f
        ora #$40
        jsr send
        inc cch
        jmp aaaa

send    sta ch
        ldx #8
        ldy #0

FACTOR = (b1-b0)

        ; "JIT" the loop
        ; somewhat adapted from KIM outch @ $1ea0
sendb   lsr ch
        bcc send0
send1   lda #$09
        sta b0,y
        lda #$01
        sta b0+1,y
        jmp sendn
send0   lda #$29
        sta b0,y
        lda #$fe
        sta b0+1,y
sendn   tya
        clc
        adc #FACTOR
        tay
        dex
        bne sendb

        ; send the bit

COUNT=18
#define WAIT ldy #COUNT:dey:bne *-1
#define BIT0 lda $1742:and #$fe:sta $1742:WAIT
#define BIT1 lda $1742:ora #$01:sta $1742:WAIT

        ; start bit
        BIT0

;---------------------------------
        ; 1
b0 = *+3
        BIT1

b1 = *+3
        ; 2
        BIT0

        ; 4
        BIT0

        ; 8
        BIT0

;---------------------------------

        ; 16
        BIT0

        ; 32
        BIT0

        ; 64
        BIT1

        ; 128
        BIT0
;---------------------------------

        ; stop bit
        BIT1
        rts

The JIT routine at the beginning of send relies on the space between each stanza in the bit sender being constant. Shifting out the bits from the character least significant first, it patches each bit's stanza of code with either an AND or ORA instruction as required, and then falls through to execute the character routine it just constructed. Reset your KIM if necessary with RS, then assemble with xa -o serial.o serial.xa and upload and run with kimup -g 0 0 serial.o (or whatever filename you used), returning to your terminal program to see the results.

This scaffold is good enough to use as is and adaptable to most baud rates by changing the COUNT value accordingly. But the KIM does have an additional 102 free bytes of memory at $1780 provided by the RRIOTs (it's actually 128, but the KIM ROM uses some of it for vectors and parameters), and it would be nice to stash this routine there so that we have the full 1K available for a more sophisticated dictionary in the future. The jitbanged routine isn't nearly that svelte, so if we want to make it really tight we'll have to write the new routine up by hand.

Achieving the savings will require rolling the bit sender section back into a loop and then squeezing out a few more bytes by turning the delays into an actual common subroutine. The revised delay subroutine is the same basic idea, but the routine will now call it instead of inlining it.

delay   ldy #14
delayy  dey
        bne delayy
        rts

Each call to the delay routine is now a minimum of 12 cycles (six for the jsr, six for the rts), but the rest of the timing is the same. If we enter at the top with a count of 14, this routine will spend 6 + 2 + (13 * (2 + 3)) + (2 + 2) + 6 = 83 cycles, or we can enter at delayy with a different value. Let's walk through the rest of the routine that sends a byte passed in the accumulator. ch is a zero page location.

send    sta ch
        ldx #8

        ; start bit
        lda $1742       ; (4)
        and #$fe        ; (2)
        sta $1742       ; (4) bit live

        ldy #15
        jsr delayy      ; 88

send8   ; cribbed from kim-1 outch routine $1ea0
        and #$fe        ; 2
        lsr ch          ; 5
        adc #0          ; 2
        sta $1742       ; 4 - bit live

We put the timings for the start bit in parentheses because until the receive line goes low, everything is being charged to the previous byte's stop bit. Once we pull the line low to generate the start bit, the period between the start of the start bit and the start of the first data bit is 88 + (2 + 5 + 2 + 4) = 101 cycles. Between data bits, however, the computation is different:

        jsr delay       ; 83

        dex             ; 2 
        bne send8       ; if taken 3 if not taken 2

In all but the final bit, the time between data bits will be 83 + (2 + 3) + (2 + 5 + 2 + 4) = also 101 cycles. In the final bit, we fall through:

        ; stop bit
sends   lda $1742       ; 4
        ora #$01        ; 2
        nop             ; 2
        nop             ; 2
        sta $1742       ; 4 - bit live

The time between the final data bit and the start of the stop bit becomes instead 83 + (2 + 2) + (4 + 2 + 2 + 2 + 4) = still 101 cycles. While the nops take up precious bytes, without them we'd end up at 97 cycles, a bit greater than the typical five percent variance most receivers will tolerate.

Finally,

        ldy #16
        jmp delayy

This last pause after the line goes high for the stop bit is just 93 cycles, which wouldn't seem enough until we add on those 10 cycles in parentheses setting up the next start bit, plus the loop that's actually calling this send-a-byte routine to send a full string (another six for the jsr along with the overhead of the string loop). This loop sends the phrase until it hits a byte with the high bit set, after which it additionally sends a comma and a line feed (a comma and space would also suffice) to make the Express speak the given utterance immediately. Incorporating that loop, the entire string sender routine ends up 101 bytes long and just fits.

As for the other two pieces, the dictionary is a table of words and phrases with the high bit of their final bytes set accordingly as an end-of-string marker, and the "talker" calls the KIM's ROM routines to read a key and uses that value as an index into the dictionary. Since the KIM's hex keypad tends to bounce a bit and the DECtalk will faithfully speak everything given to it, even errors, we use the interval timer to wait briefly as a crude means of debouncing the keys. In its entirety, this is the talker:

        * = $0000

strsend = $1780
zp      = $f9
key     = $ee
dict    = $0300

        cld
top     jsr $1f6a
        cmp #$15        ; wait for key
        beq *-5
        sta key

        ; use keypad as index into dictionary
        asl key
        ldx key
        lda dict,x
        sta zp
        lda dict+1,x
        sta zp+1
        jsr strsend

        ; keypad tends to bound, wait a bit between keys
        ldy #3
wait    lda #128
        sta $1707
        lda $1707
        bpl *-5
        dey
        bne wait

        jmp top

With 21 keys there are thus 21 selectable dictionary entries, indexed by the table of pointers at the beginning of the dictionary to each individual word or phrase. It is absolutely possible to expand that number further with key chords or combinations, but we would need to read the keys manually using the RRIOT instead of using the convenience routines. I also leave this exercise to the ravenously interested reader.

The source for the three modules is also on Github. With kimup installed, a simple make will build the modules with xa65, upload them in a bulk single transfer to the connected KIM-1, and start the "talker" ready to send a string (or do make test to then start minicom so you can see the strings locally).

The last step is getting the DECtalk Express connected, and naturally you'll need to keep the KIM-1 powered on while switching the serial connection to the KIM-1 from your workstation to the Express. Bob's I/O board conveniently presents the serial lines as a null modem so any straight-thru cable or USB-serial dongle will connect it to your computer, but the Express's converter is expecting a straight-thru connection from a PC COM port. Fortunately, a little DE-9(male) to DE-9(male) mini-null modem will flip the transmit and receive lines back around, and here we are:

The 9600 baud string sender can be easily made more flexible to send arbitrary data, or you could call the byte sender directly instead with your own routine. It's also possible to adjust the delays to make this routine transmit even faster: the slowest of the bit-to-bit intervals is within the data bit loop at a worst case of 18 cycles, so solving for baud, we top out at 55555.56. This is a little shy of 57600bps but 38400bps should be absolutely achievable. Receiving would be a little trickier, but that might be something we'll attack in the future. One possible solution is to bitbang some additional GPIO lines for hardware flow control.

At any rate, I'll let the KIM-1 have the last word:

So let's finish the story. Of the original MITalk licensees, Speech Plus (the makers of Dr Hawking's CallText 5010) was bought out by Centigram Communications Corp. in 1990, acquiring its MITalk license and other speech synthesis and voice response technologies for their telephone and fax messaging products. Centigram in turn was bought in 2000 by ADC Communications, a Minnesota-based data and telecommunications firm, and ADCC was then bought by Tyco Electronics in 2010 (now TE Connectivity). Neither Centigram nor TE developed any other end-user devices.

DECtalk, however, managed to live on. Although DEC continued to sell the DECtalk hardware products, in 1994 it introduced a software DECtalk implementation for the first time, returning the product to its roots. DECtalk 4.2 (to avoid version conflicts with DECtalk-PC's driver, which was 4.1) ran on Windows NT and Digital UNIX on Alpha and supported stereo output with all ten voices. A demonstration application spoke from a text file, but the full SDK could render to WAVe files, to the speaker or to a memory buffer. DEC subsequently ported it to x86 Windows NT and Windows 95 and added support for the Windows Speech API.

Perhaps because the SDK was licensed so widely, DECtalk and the software DECtalk 4.x survived DEC CEO Robert Palmer's unsuccessful waves of selloffs and layoffs in the mid-1990s for several more versions, even adding text-to-speech support for Spanish, and it passed to Compaq in the 1998 "merger" relatively intact. Compaq even did some development on the product and issued DECtalk 4.5.1 for Windows NT, 95 and 98 in 1999, but eventually started selling off properties as its own financial troubles mounted and DECtalk was one of the first to go that same year.

Fremont, California-based SMART Modular Technologies first acquired the line and continued to sell DECtalk devices — my NOS DECtalk Express was a SMART unit — though it doesn't appear they manufactured more of them. Instead, SMART continued to license the SDK to third parties until they sold it later in 2000 to embedded systems developer Force Computers, who ported it to ARM (initially StrongARM/XScale, in a lovely little bit of irony) and added support for Red Hat Linux and Windows CE with version 4.6, and sold remaining device stock. In 2001, Force issued version 4.6.1 with additional languages and support for Windows Speech API 5.0, but then sold DECtalk again in December, this time to Utah-based Fonix Corp. to "expand its place in automotive, mobile and wireless TTS applications, as well as open new opportunities in speech product areas." Fonix made their own upgrades to DECtalk, including releasing versions 5.0 and 5.0.1, but the speech market was shrinking and despite spinning off the Fonix Speech unit as SpeechFX in 2011, neither company survived. Fonix went out of business in 2014 and SpeechFX closed their doors in 2020.

That might have been the end of DECtalk were it not for blind developer Jake Gross, who managed to telephone one of the DECtalk developers, Edward Bruckert. Bruckert had been at DEC and later worked for Fonix, and still had a source code copy of Fonix's 5.0 beta release from 2004 with all necessary files. He provided it to Gross, who uploaded it to his own website, and is now the basis of the Github source code version of DECtalk which builds and runs perfectly on my POWER9 Raptor Talos II in Fedora 38. Bruckert died in 2017, but thanks to what he passed on, DECtalk can still run on your modern computer today.

I've uploaded to Github both KIMup, the KIM-1 uploader utility, and the source of our KIM-1 to DECtalk Express serial sender (which of course will work with an O.G. DTC01 DECtalk too). Meanwhile, what else can we do with the KIM-1's improving communication skills? I think we need to get a one megahertz, one kilobyte 8-bit single board computer into Gopherspace now. Shouldn't everything be on the Internet? Stay tuned for a future article in which we answer that question!

3 comments:

  1. I feel honored to be mentioned in this article, as DECtalk would of died without me stepping in to get source code preserved. Another former developer has also provided more source code archives to assist in the DECtalk revival project. Once we started to revert the voice back to the classic DEC sound, we got stuck when trying to revert the intonation algorithm. I managed to get in contact with Stacey Schnee, who was mentioned in this article. https://www.inverse.com/input/features/tropetrainer-thomas-buchler-torah-software
    After introducing myself and explaining the situation, She provided 9 more zip files of source code without me even asking if she had anything backed up. these recently surfaced archives contain the source code for various versions of the software, ranging from the SMART Modular Technologies era to the mid to late Fonix era. We even have the source code to recreate the last version heard on NOAA Weather radio! https://www.youtube.com/watch?v=kHamS_seaQc

    ReplyDelete
    Replies
    1. Hey, you get all the credit! It wouldn't have happened if it weren't for you.

      What would be really fun is if we could get the original MITalk version, since that would be closest to Hawking's before the KlattTalk branch. Not sure how feasible that would be.

      Delete
    2. There are actually pros-2000 ROMs available on the Internet Archive, so someone could theoretically reverse engineer them and re-implement the algorithms in a language such as C. The file that contains these ROMs is at https://archive.org/download/mame-merged/mame-merged/prose2k.zip

      Delete

Comments are subject to moderation. Be nice.