What does “Code is Data” even mean? 🤔
In short: when language syntax and its abstract syntax tree (AST) looks the same.
What does it give us? If syntax is the same as its AST it means that somehow code is defined in terms of itself. Since AST is a structured representation of code as data, which is used by compilers to manipulate programs and generate different code, then the program written in such language should be able to manipulate itself.
Let’s look at this snippet of code written in JavaScript:
const x = 1; if (isZero(x)) { log('zero'); } else { log(x); }
In order to transform this code (for example ES2015 into ES5) we would need a source-to-source compiler, such as Babel, which would parse program text into its AST, transform it into another AST and then generate a new code. It means that to transform JavaScript code we need a representation of it which can be manipulated by the language, in this case code text is being transformed into JSON.
Here’s AST of the above program in JSON format:
{ "type": "Program", "body": [ { "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "Literal", "value": 1, "raw": "1" } } ], "kind": "const" }, { "type": "IfStatement", "test": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "isZero" }, "arguments": [ { "type": "Identifier", "name": "x" } ] }, "consequent": { "type": "BlockStatement", "body": [ { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "log" }, "arguments": [ { "type": "Literal", "value": "zero", "raw": "'zero'" } ] } } ] }, "alternate": { "type": "BlockStatement", "body": [ { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "log" }, "arguments": [ { "type": "Identifier", "name": "x" } ] } } ] } } ] }
But how would look like a language syntax where AST is code and code is AST? Most certainly we don’t want to write programs using the above representation because it is overly verbose. However this JSON representation can give us a hint on where to start looking for such language syntax. We know already that AST is a structured representation of code, which means that the code should be also structured. If we look at above AST it is clear that there’s nodes which correspond to certain syntax tokens.
For example the following is AST of isZero(x) expression, where both isZero and x are of type Identifier and form a single CallExpression:
{ "type": "CallExpression", "callee": { "type": "Identifier", "name": "isZero" }, "arguments": [ { "type": "Identifier", "name": "x" } ] }
Let’s encode those AST nodes back into their textual representation but preserve AST structure (imagine we have Babel plugin for this). We’ll use Array literals instead of objects to represent structure and drop unnecessary syntax:
[isZero x]
The original program would look like this in our made up language:
[const x 1] [if [isZero x] [log 'zero'] [log x]]
Looks like we ended up having properties of a language described at the beginning: the code is AST and AST is the code (where in this case AST is encoded into keywords using Babel).
Now if we encode the entire JavaScript like this we could write a program that can transform itself 💡. Let’s write a program that transforms ES2015-ish variable declaration into its ES5 analogue.
// input code [const x 1] // expected output code [var x 1] // transformation [function transform [[declKeyword varName varVal]] [return [var varName varVal]]] // running transformation [transform [const x 1]] // => [var x 1]
Having a language like this doesn’t require to have a specialized compiler to transform code, you just write transformation in the language itself. Lisp is a family of such languages and this property of a language is known as homoiconicity. Here’s an example of code transformation written in ClojureScript:
;; input code (def x 1) ;; expected output code (def x 2) ;; transformation (defmacro transform [[decl-keyword var-name var-val]] `(def ~var-name ~(+ var-val 1))) ;; running transformation (transform '(def x 1)) ;; => (def x 2)
In Lisp languages a code that manipulates code is called macro. A macro is a function that is executed at compile time against other code. Macros are heavily used in Lisps to build DSLs and introduce new language constructs (pattern matching, async/await, etc.), which means that you can invent new syntax in your programs. In fact most Lisps itself are built on macros, the baseline of a language is usually less than 30 forms and everything else is a macro or a function. Here's some examples and their macro definitions:
(defmacro when [test body] `(if ~test ~body)) (defmacro if-not [test body] `(if (not ~test) ~body)) (when true (log "IT'S TRUE!")) (if-not false (log "IT'S TRUE!"))
Now it should be clear what does “Code is Data” mean.