Hm ... you using Python to bootstrap. Much harder would be to assume no languages whatsoever and go from there (some musings I've done on this type of project: http://boston.conman.org/2009/11/05.1)
In MS-DOS, you don't need an octal-to-binary conversion program, assuming you're content writing your next program in decimal instead of binary, because you can type any 8-bit byte with Alt and the numeric keypad. You can also write useful 8086 machine code using only printable ASCII, but that's probably making things hard on yourself unnecessarily.
Did 1982 MS-DOS have I/O redirection? You're right that that would help quite a bit.
I think one of my first programs in that environment would be some kind of editor. Maybe a hex or octal editor rather than a text editor, but an editor nonetheless. Typing programs more than a few dozen bytes with no ability to see or fix typos gets old fast.
I don't know how much harder the job really gets if you bootstrap directly from machine code instead of Python. I don't think it's that much.
I remember typing in simple assembly listings for utilities to use in batch files from books & magazines. (generally saved as headerless .com files) Unfortunately, I didn't try hard enough to find out how they worked at the time, which probably delayed my programming abilities by a couple of years.
The question says you have just COMMAND.COM, MSDOS.SYS, and IO.SYS.
One day Nirva Toomish entered the ACM Programming Competition. But the languages allowed were Pascal and something else, neither of which he knew. But the machine had DEBUG.COM on it, so he wrote his programs in DEBUG.COM's assembler, and solved several (all?) of the problems.
The problem came when the proctors asked to see the source code. He used DEBUG.COM to disassemble his programs, but they were not satisfied with that.
So if you really had to solve this problem on 1982 MS-DOS, you'd need to open an output file with a file control block and INT 21 with AH=0F. (And if you really can't type NUL bytes on the keyboard, I think you are going to have to write some code to zero out most of the FCB at start time, too. Ick.)
This program might solve that problem for MS-DOS 2+, although I haven't tested it yet, and you'd have to use something like JKLMNO instead of ABCDEF for your hex digits past 9:
I think you could probably create that program successfully with COPY CON in not very many tries, but it's not much better than COPY CON itself. I don't think INT 21 function 01 even supports backspace, and in that sense it might actually be worse.
In case anyone's interested, I successfully entered the following version in DOSBOX using COPY CON, and indeed even used it to recreate a copy of itself:
It turns out the 08 meaning "or %al," was getting interpreted by COPY CON as a backspace, so this version adds AX to DX instead. (I sure am glad I didn't have to debug that using TYPE.) This program has advantages and disadvantages compared to COPY CON:
1. It echoes its input so you can see, at least in theory, if you made a mistake.
2. You only need to type two keystrokes instead of one to four per byte.
3. There are no forbidden bytes. At least in Dosbox, COPY CON converts ^@ to the sequence 00 03, interprets 08 as backspace (even if entered on the keypad with Alt), and stops copying when it gets a carriage return, even if the carriage return was entered with Alt.
4. To exit, you must reboot. In Dosbox, the output has already been flushed to the filesystem, but I have my doubts about whether, on MS-DOS, every single INT 21H AH=2 would write a 512-byte floppy disk sector with the newly appended byte. On the other hand, you could probably just mash some key on autorepeat long enough to fill up a couple of sectors, and you'd be good to go. This program, after all, only has to be sufficient to enter the next phase of the bootstrap, one with backspace and explicit termination.
5. There is no backspace, so if you hit any incorrect keystrokes, you must start over. For programs of this size, that's not a major constraint — it's pretty easy to carefully enter 40 or 50 or 100 keystrokes without making any errors — but it becomes more serious as programs get larger.
Ah. I knew 1.0 didn't support redirection, but I didn't realize that 2.0 came out on 1983 (really? That late? Wow). And it's very cool what you've done so far.
A friend of mine was master of getting zeros without zeros - usually without increasing the program size -- so that he could type them with a "copy con" command.
He just knew the instruction codings and what not to use - e.g., "sub al, al" instead of "mov al, 0". He knew all of those by heart. Both in hex and in decimal.
(Me, I was able to read machine code from hex, but not write it unless it was extremely simple; and the hex<->dec conversion took too long. So I just used debug for those 10-20 instruction things)
I see. After playing with this for a bit today (see http://lists.canonical.org/pipermail/kragen-hacks/2011-April...) it seems like forward calls, immediate zeroes, 16-bit immediate operands with a zero byte, and doing an OR or an ADD with AL as the source operand are among the enemies to avoid.
It's still kind of magical to me to write a program that does something useful that's only half a line of gibberish when you TYPE it. I just wish there was a better way than PNG to share what it's supposed to look like!
Sounds like a fun exercise for anyone who has just read "Reflections on Trusting Trust." "You can't trust code that you did not totally create yourself." Well what if I did totally create all the code myself?
I guess you'd still be susceptible to microcode, so you'd have to create your own CPU too. Sigh, back to the drawing board. :)
Amazing that 80 column unstyled ascii text, which is supposed to be the absolutely lowest level of abstraction, able to pass through even the dumbest mail servers without being mangled, able to be read by any computer made in the last 40 years, windows boxes, mac boxes, sun boxes, dumb terminals, propped-up toasters running NetBSD; is finally stymied by the ultimate low end machine: one with a screen that isn't physically large enough for 80 column text.
I think I need to correct stoneknifeforth a bit to make it run on current Linux.