First Emacs Syntax Highlighter

Writing a syntax highlighter for emacs starting from zero knowledge.

Syntax Highlighting

I prefer to have syntax highlighting enabled when writing and editing code and prose. Typical syntax highlighting uses different colors for different syntactic parts in a language. It's easiest to see with an example contrasting highlighted code with plain code:

Highlighted:

print(23, "hello") # comment

Not highlighted:

print(23, "hello") # comment

This can be helpful when you're first learning a language because it helps you spot keywords and punctuation used in unfamiliar ways.

The Task

In a way, programming is formalized problem solving.

To solve a big or complex problem, common sense says to first break it down into smaller parts, then solve those parts. By repeatedly solving the small/simple parts you will eventually solve the big/complex part.

As programmers, we use programming languages to solve informally specified goals, and we get so used to this decomposing-the-problem routine that we become susceptible to falling into rabbit holes or yak shaving, which actually take us farther from our goal.

In this case my primary task is to rewrite my Nand2Tetris assembler (currently written in Python3) in K. My motivation for the rewrite is to learn more about the K language. A secondary motivation is the joy of Code Golf. I think it's fun to try and optimize a program for the smallest source code size possible.

k-mode

So here I am about to rewrite an assembler in K. I start by creating an empty buffer called assembler.k and begin adding code. But there is no syntax highlighting! Emacs does not know about K language by default. I find k-mode on GitHub which seems promising - it is for the free version of K6 that I am using (ngnk).

This mode seems to do a lot, but one thing it does not do is …syntax highlighting.

First Attempt

I try writing a k-mode just for syntax highlighting, but as a newb at Emacs lisp, I make mistakes.

First, I try copy-pasting an example from BQN (another array language). After changing some of the values to match K's syntax it seems to be working, which is encouraging. But then I notice that if I try to (for example) change one of the colors in the syntax file, I don't see a corresponding change in the K file.

After asking for help, I eventually realize that the use of defvar when creating the syntax table means "define at most once". This means that reloading the mode with eval-buffer has no effect after the first time.

One solution seems to be to use defconst, which by contrast is re-evaluated by eval-buffer. Perhaps this is not the most appropriate solution, but it is the first one I found (described in detail here). As I learn more elisp, perhaps I will eventually discover a better approach but for now this gets the job done. This already seems like a rabbit hole, but Emacs is a living program so I want to take advantage of its capabilities.

Sidebar - reloading in doom-emacs

At this point I have two files open and I switch between them to see the effects of my changes.

  • k-mode.el
  • assembler.k

After editing k-mode.el, I want to re-evaluate the entire mode. In doom-emacs the key combo SPC m e b does the job.

Then I switch to the assembler.k buffer, but to actually load the new mode I need to refresh it. In doom, that's SPC b r.

Now I have this in my k-mode.el file:

(defconst k--token-syntax-types
  '((
     ("[0-9]+" . font-lock-constant-face)
     (" /.*$" . font-lock-comment-face)
     )nil nil nil))

(defconst k--syntax-table
  (let ((table (make-syntax-table)))
    (modify-syntax-entry ?\/  "<" table)
    (modify-syntax-entry ?\n  ">" table)
    table)
  "Syntax table for k-mode.")

;;;###autoload
(defgroup k nil
  "Major mode for editing K code."
  :prefix 'k
  :group 'languages)

;;;###autoload
(define-derived-mode k-mode prog-mode "k"
  "Major mode for editing K source code."
  :syntax-table k--syntax-table
  :group 'k
  (setq-local font-lock-defaults k--token-syntax-types)
  (setq-local comment-start "/"))

;;;###autoload
(add-to-list 'auto-mode-alist '("\\.k\\'" . k-mode))
;;;###autoload
(add-to-list 'interpreter-mode-alist '("k" . k-mode))
(provide 'k-mode)

And given this assembler.k file:

/ basics
42 4.5 -4 10 /vector of numbers
1.2e3 /scientific notation
"x" /char
"abc" /string
(2;3;4) /list
a:42 /assign
{x+y} /lambda
f:{x+y}; f[2;3] /assign lambda and call it

/ tricky spacing
+ /x        /+ followed by commented out x
+/x         /sum x

Generates this syntax highlighting:

/ basics
42 4.5 -4 10 /vector of numbers
1.2e3 /scientific notation
"x" /char
"abc" /string
(2;3;4) /list
a:42 /assign
{x+y} /lambda
f:{x+y}; f[2;3] /assign lambda and call it

/ tricky spacing
+ /x        /+ followed by commented out x
+/x         /sum x

Next Steps

Most of the highlighting in the example so far came from the deriving mode (prog-mode). I want to leverage as much of prog-mode as possible, so I will keep the existing colors for strings and numbers but extend the numbers to recognize 2e3 and such. Then I will add support for null literals 0n and 0N.

Hopefully getting familiar with highlighting constant literal values will be good practice for highlighting functions, operators, and some of the tricky aspects of ngn/k like how f /y is f followed by a commented-out y, but f/y is y reduced by the binary function f.