VM Repl

What special features does a language interpreter need that are different from an ahead-of-time compiled language?

Programming languages fall into 2 main categories: compiled and interpreted. Usually this also implies different levels of interactivity. Compiled usually means not interactive, and interpreted usually means interactive, and often that implies that the interpreted language has a read-eval-print-loop (repl) interface. Sometimes this repl is the de-facto language implementation, in the same way that some compiled languages have only one compiler implementation.

There are exceptions to this, but for the most part it's a useful generalization. But the differences get blurry in some places. For example, interpreted languages often have constructs that have multiple parts, and which don't make sense unless all parts exist.

Here's a concrete example in Python:

if condition:
  print(True)
else:
  print(False)

Python interpreters allow users to type this if/else statement character by character. This requires the interpreter to defer execution of some lines of code in certain situations. Specifically, when you type the above statement, the interpreter goes into a special mode and does not execute the first print statement right away. The cpython repl handles this in an interesting way. While you type a multi-line statement, it defers interpretation until the user types Enter twice:

>>> if 1/0:                                 # no error yet
...   print('control does not reach here')  # still fine
...                                         # type Enter once to get here
>>>                                         # type Enter twice, error shows up here

By contrast, if the user types 1/0 without an if in front, the error appears right away:

>>> 1/0                                     # type Enter; instant error message

Deferring execution in the repl

In the original Nand2Tetris VM language, there is global, "top-level" code, and there are function definitions. In theory, these are separate execution contexts. In practice, the only thing that happens in the global scope is an implicit call to a Sys.main function. From there, all execution occurs in the context of functions.

However, when adapting this language for use in a repl, I decided to allow the user to execute top-level statements in between function definitions. This makes the repl more useful, because you don't need an implicit Sys.main and can start executing code right away. But a goto statement does not make sense in a repl context, so I removed it. I also added an end keyword to mark the end of a function definition1.

From a usability perspective, the interpreter should let the user define a function without immediately executing it. For example, as long as you don't call A before defining B, this should work:

function A 0
  call B 0          // B hasn't been defined yet
  return
end

push constant 999   // some top-level code

function B 0
  push constant 32
  return
end

call A 0            // B is now defined, so we can call A

For an ahead-of-time compiled language, steps to execute are:

  1. tokenize and parse the source code
  2. translate function definitions to assembly code
  3. later, execute the assembly code, which will begin with a jump to Sys.init

This interpreted version takes a different route:

  1. tokenize and parse the source code
  2. when interpreting a function definition, save the code somewhere, but don't execute it yet
  3. later, when interpreting a call instruction, jump to the appropriate function and begin execution

Footnotes:

1

The end keyword marks the end of a function definition and resumes a global execution context between functions. As I was writing this post, another way of ending functions occurred to me: I could copy how Python works. The language could treat an extra blank line in the interpreter as a "function end" token. This would mean that code written for the compiler and code written for the interpreter could look the same but have different meanings. But my preference is that code that is different should also look different, so for now the end keyword stays.