Thursday, March 31, 2011

Who Wrote How Much of the Original Microsoft BASIC Interpreter?

Today's Wall Street Journal account of Paul Allen's book about Bill Gates includes the following passage:

  • In the mid-1970s, when the two college dropouts were based in New Mexico, Mr. Allen says Mr. Gates asked for 60% of their partnership because of his greater contribution to the creation of software for running the BASIC programming language on an early PC, the MITS Altair 8800. Mr. Allen says he had assumed that their partnership was evenly split, but he agreed to Mr. Gates's request.

I had taught Bill Gates in a couple of courses during his years at Harvard. He used to doze through my lectures -- I figured out only later what he must have been doing late at night that made him so drowsy in the morning. I have never met Paul Allen.

Sometime after Bill pulled out of Harvard, a computer program listing turned up behind an old file cabinet. It was printed on 11x17" paper on the old, all-caps line printer that was hooked up to the PDP-10. As is well known, that BASIC interpreter had been written on the PDP-10. It was cross-assembled, which means that unused PDP-10 op codes had been given the names of the 8080 instructions, and when the PDP-10 tried to execute these instructions a "trap" occurred. In the trap code the programmers had added some emulation routines to make the PDP-10 do what the 8080 would have done if it had been executing the instruction. This made it possible to use the PDP-10 debugging tools to debug the 8080 code. Indeed, legend has it that they never saw an actual 8080 chip the whole time they were debugging the interpreter. Pretty clever bootstrapping exercise.

The original is in the Harvard Archives; a few pages are reproduced in a display on the wall of the ground floor lounge of Maxwell Dworkin, the Harvard computer science building donated by Gates and Ballmer. Here is what the listing says near the top:

TITLE BASIC MCS 8080 GATES/ALLEN/DAVIDOFF

SUBTTL VERSION 1.1 -- MORE FEATURES TO COME


-------------------------------------------
COPYRIGHT 1975 BY BILL GATES AND PAUL ALLEN
-------------------------------------------


WRITTEN ORIGINALLY ON THE PDPD-10 AT HARVARD FROM
FEBRUARY 9 TO APRIL 27


PAUL ALLEN WROTE THE NON-RUNTIME STUFF
BILL GATES WROTE THE RUNTIME STUFF
MONTE DAVIDOFF WROTE THE MATH PACKAGE

At the risk of violating the copyright, I am going to reproduce the first few lines right here.


Monte Davidoff was another undergraduate. I have never talked to him about his work for Gates and Allen, which resulted in his being acknowledged by them but not included on the copyright notice.

I leave it to the interested reader to speculate about whether the order of names has any significance. I can say that Gates was at Harvard and was the one who actually had a PDP-10 account; if Allen ever used the machine I was not aware of it.

I came to the office this morning planning to go through the listing and try to distinguish the three kinds of code, so as to infer something about who contributed how much. Before I even go there, though, I should say that "lines of code" is a very poor metric for percentage contribution. (Cf. the classic The Mythical Man-Month, by another Harvard great, Fred Brooks.) For example, rewriting some program you have written ten times before is less of a contribution than writing something much shorter new from scratch.

In Davidoff's case the code-lines contribution is easy to estimate, because the math routines are in a separate file. It is about 31 pages long, and includes all the expected things, right down to an ARCTAN function written in 8080 assembly code.

The main file is about 65 pages long. I may yet go through it page by page and try to do some classification, but when I tried this morning I realized that this is a BASIC interpreter, a program that takes in statements on the fly, parses them, and executes them. There are some data definitions and storage tables, which are surely non-executable. But it's not clear to me what else the "non-runtime stuff" might comprise. Perhaps the lexical and syntax analysis are considered non-runtime; there is a fair amount of that, though not enough to make me think Allen has a strong case for being aggrieved. 

In any event, I find Allen's attitude, as reported in the article, pretty irritating. It is just unseemly for one billionaire member of this team to be revisiting at this point that old issue of whether he should have gotten 50% or 60% of the equity based on how much he wrote of a program that was less than 100 pages long (at 60 lines per page) and of which a third was written by neither of them!

2 comments: