The JSX Transform
A couple of years ago, I praised Facebook's frontend library, React. I talked about how I hijacked React's JSX Transform to write a static site generator (SSG). The SSG used to create this blog.1
Since then, I realized React is not required to define a static website through JSX templating. Modern transpilers let us quickly implement our own JSX libraries from scratch. In return for minimal effort, I was able to write an SSG with the following features:
- HTML and XML document generation
- Raw HTML injection
- Native HTML attribute support in JSX
I will help you write the same SSG. You should read the first blog post before continuing -- I frequently reference previous work.
Desugaring JSX
As you know, browsers and NodeJS cannot understand JSX, so it must be transpiled to plain JavaScript before use.
Transpilers like Babel and esbuild parse JSX expressions and transform them into calls of a factory function. The path to the factory function is a transpiler directive. You can specify the path in transpiler configuration files like babel.config.js or jsconfig.json.
jsconfig.json should look like
{
"compilerOptions": {
"jsxFactory": "React.createElement",
}
}
with React as the JSX UI library.2
Recall that the JSX factory must take the form
(name: Name, attributes: Attributes, ...children: Element[]) => Element;
, where
-
name
is a JSX element name or a JavaScript identifier, -
attributes
is an object keyed by attributes, and
type Attributes = { [key: string]: any } | null;
children
represents transformed child elements.
Before we write the factory, you should play in a JSX transpiler REPL. Hopefully, you observe five things:
name
is a string for an element with a lowercase name and a JavaScript identifier otherwise3
For example, Babel transforms the component
const PaginationLink = ({ slug }) => <a href={`/posts/${slug}.html`}>Next</a>;
to
const PaginationLink = ({ slug }) =>
React.createElement("a", { href: `/posts/${slug}.html` }, "Next");
, but
<PaginationLink slug="extending-react" />
becomes
React.createElement(PaginationLink, { slug: "extending-react" });
children
can be a nested array
Observe that
const Languages = ({ children }) => (
<ul>
{children.map((language) => (
<li key={language}>{language}</li>
))}
</ul>
);
becomes
const Languages = ({ children }) =>
React.createElement(
"ul",
null,
children.map((language) =>
React.createElement("li", { key: language }, language),
),
);
, not
const Languages = ({ children }) =>
React.createElement(
"ul",
null,
...children.map((language) =>
React.createElement("li", { key: language }, language),
),
);
attributes
can benull
const Languages = ({ children }) => (
<ul>
{children.map((language) => (
<li key={language}>{language}</li>
))}
</ul>
);
becomes
const Languages = ({ children }) =>
React.createElement(
"ul",
null,
children.map((language) =>
React.createElement("li", { key: language }, language),
),
);
, not
const Languages = ({ children }) =>
React.createElement(
"ul",
{},
children.map((language) =>
React.createElement("li", { key: language }, language),
),
);
Unfortunately, we must respect the precedent set by Babel and React when implementing the JSX factory.
Also, note the presence of a key
attribute not used by li
. The key
attribute informs React which children have changed during a re-render.4 We will not need to supply keys to our elements since SSGs only render each element once.
- Children can come through
attributes
Observe that the expression
<Languages children={["Python", "Java"]}></Languages>
, which is very much legal, becomes
React.createElement(Languages, {
children: ["Python", "Java"],
});
, not
React.createElement(Languages, null, "Python", "Java");
React’s createElement
reads children from props
-- React's name for attributes
-- and falls back to children
:
children = props.children ? props.children : children;
This behavior makes it possible to pass children to React dynamically. But the expression
<Languages>{["Python", "Java"]}</Languages>
is more natural to write than
<Languages children={["Python", "Java"]} />
And if you write
<Languages children={["Python", "Java"]}>{["Python", "JavaScript"]}</Languages>
, you will find React chooses to discard content between the Languages
tags.
In my view, it is far better for the JSX factory to reject children passed through a named attribute with an exception.
name
can refer to a fragment
JSX expressions can be elements or fragments. A fragment wraps a series of child expressions in a tag with no name.
To see why fragments are handy, take a look at component Tags
, which formats tag strings for display:
const Tags = ({ children }) => (
<div>
{children
.map((tag) => <Tag key={tag}>{tag}</Tag>)
.reduce((tags, tag) => [...tags, " # ", tag], [])}
</div>
);
Tags
wraps its body in a div
to respect the JSX specification, needlessly inflating the DOM.
Instead, wrap the body in React.Fragment
:
const Tags = ({ children }) => (
<React.Fragment>
{children
.map((tag) => <Tag key={tag}>{tag}</Tag>)
.reduce((tags, tag) => [...tags, " # ", tag], [])}
</React.Fragment>
);
Even better: tell the transpiler to expand an empty tag to React.Fragment
.
const Tags = ({ children }) => (
<>
{children
.map((tag) => <Tag key={tag}>{tag}</Tag>)
.reduce((tags, tag) => [...tags, " # ", tag], [])}
</>
);
becomes
const Tags = ({ children }) =>
React.createElement(
React.Fragment,
null,
children
.map((tag) => React.createElement(Tag, { key: tag }, tag))
.reduce((tags, tag) => [...tags, " # ", tag], []),
);
Putting pen to paper
Let's implement the JSX factory, elem
, armed with our knowledge of the JSX Transform.
elem
should begin with input sanitation. It must
- normalize attributes,
- reject children passed through named attributes, and
- flatten children:
function elem(name, attributes, ...children) {
attributes = attributes === null ? {} : attributes;
if ("children" in attributes) {
throw new IllegalArgumentError(
"JSX children may not be passed through a named attribute",
);
}
children = children.flat();
return null; // TODO
}
Then elem
should dispatch on name
. It must
- wrap children in
Fragment
ifname
refers toFragment
, or - create an
Element
ifname
is a string, or - call
name
otherwise:
function elem(name, attributes, ...children) {
if (name === Fragment) {
return new Fragment(children);
}
if (isString(name)) {
return new Element(
name,
new Attributes(Object.entries(attributes)),
children,
);
}
return name({
...attributes,
children,
});
}
Fragment
and Element
both extend Expression
:
class Expression {
constructor(children) {
this.children = children;
}
}
class Element extends Expression {
constructor(name, attributes = new Attributes(), children = []) {
super(children);
this.name = name;
this.attributes = attributes;
}
}
class Fragment extends Expression {}
class Attributes extends Map {}
Thus Expression
represents a tree of XML nodes.
HTML, XML, and friends
The code to serialize an Expression
writes itself:
class Element extends Expression {
toString() {
const name = escape(this.name);
const attributes = Array.from(this.attributes)
.map(
([name, value]) =>
`${escape(String(name))}="${escape(String(value))}" `,
)
.join("")
.trim();
const { children } = this;
if (children.length && attributes.length) {
return `<${name} ${attributes}>${children.join("")}</${name}>`;
}
if (children.length) {
return `<${name}>${children.join("")}</${name}>`;
}
if (attributes.length) {
return `<${name} ${attributes} />`;
}
return `<${name} />`;
}
}
class Fragment extends Expression {
toString() {
return this.children.join("");
}
}
Observe that toString
escapes name
and attributes
, but leaves content
alone. Arbitrary content is tricky to escape. This blog injects HTML by design, anyway.5
Thankfully, escaping XML names and attributes is straightforward.6 You need to escape five characters in XML: the escape character "&"
, "'"
, '"'
, "<"
and ">"
. Also, depending on where they appear, you only need to escape a subset of these five characters.
For example, if you delimit attribute values with '"'
, '"'
and "&"
are the only characters you need to escape within attribute values.
However, being too intelligent leads to bugs and ugly code. I think it is safer to escape all four characters wherever they appear:
function escape(string) {
const map = new Map([
["&", "&"],
['"', """],
["<", "<"],
[">", ">"],
]);
const buffer = [];
for (const char of string) {
if (map.has(char)) {
buffer.push(map.get(char));
} else {
buffer.push(char);
}
}
return buffer.join("");
}
Testing, Testing
We aren’t done yet. Any developer worth their salt will want to unit test the components they write. The SSG should provide facilities to query Expression
trees. I figure two methods on Expression
are sufficient:
find
returns expressions satisfying a predicate, andcontains
looks for a single expression.
find
find
collects matching expressions by carrying out a depth-first traversal of Expression
:
class Expression {
find(predicate) {
const stack = [this];
const expressions = [];
while (stack.length != 0) {
const expression = stack.pop();
if (predicate(expression)) {
expressions.push(expression);
}
const children =
expression instanceof Expression ? expression.children : [];
for (const child of children) {
stack.push(child);
}
}
return expressions;
}
}
Here's some test code written with find
:
test("Sidebar renders links", () => {
const sidebar = <Sidebar />;
const links = sidebar
.find((e) => e.name == "a")
.map((e) => e.attributes.get("href"));
expect(links).toHaveLength(4);
expect(links).toContain("/posts");
expect(links).toContain("/about.html");
expect(links).toContain("https://github.com/bfdes");
expect(links).toContain("/feed.rss");
});
It verifies the sidebar of this blog displays four links.
contains
contains
can implemented with find
. Suppose Expression
has an equals
method that respects value semantics.7 Then we can write
class Expression {
contains(expression) {
return this.find((e) => e.equals(expression)).length > 0;
}
}
and go home.
While this works, it isn't good enough. The whole point of using contains
over find
is to terminate early.
This second attempt results in a more effcient search, and it doesn't even cost many LOCs:
class Expression {
contains(expression) {
return (
this.equals(expression) ||
this.children.some((child) =>
child instanceof Expression
? child.contains(expression)
: child === expression,
)
);
}
}
Here's some test code written with contains
:
test("Meta renders publication date", () => {
const meta = (
<Meta
created={new Date("2019-11-12")}
tags={["Python", "Java"]}
wordCount={1}
/>
);
expect(meta.contains("12 November 2019")).toBe(true);
});
It verifies this blog displays publication dates in Gregorian format.
OK, now we are done. We wrote a JSX UI library in less time than it takes to learn React. But remember: this library is limited to static site generation!
Footnotes
-
I am acutely aware of where I feature on this diagram. ↩
-
This guide refers to the classic JSX Transform, not the one introduced in React 17. ↩
-
In the case where React is the JSX library, identifiers should resolve to function or class components. Our library will only support stateless function components. ↩
-
Imagine a
Languages
element is passed["Python", "Java"]
in a first render and["Java", "Python"]
in a re-render. Without keys, React will recreate all DOM nodes corresponding to theLanguages
element; with stable and unique keys, React will recognize that children have changed places. ↩ -
The library that processes blog post markup returns raw HTML. Accepting user-generated raw HTML will likely lead to XSS attacks, but injecting HTML from a trusted source at build time is fine. ↩
-
According to StackOverflow users 😬. ↩
-
equals
is tedious to implement. I leave it as an exercise for the reader 😛. ↩