Human Data Forms (HDF)
Specification & Grammar
This document defines the Human Data Forms (HDF) file format: a Lisp-inspired,
human-oriented data language intended as a replacement for JSON, YAML, and TOML
for configuration, inventory, and infrastructure-style data.
1. Design Goals
- Human-first readability and editability
- Deterministic, single-pass parsing
- Minimal punctuation and quoting
- No indentation semantics
- Explicit structure (Lisp-style forms)
- Strong round-tripping guarantees
- First-class, structured comments
2. Lexical Structure
2.1 Encoding
- UTF-8
- Newlines: LF or CRLF (normalized to LF)
- NUL (U+0000) is forbidden
2.2 Whitespace
whitespace ::= " " | "\t" | "\n" | "\r"
ws ::= whitespace*
ws1 ::= whitespace+
Whitespace separates tokens and has no semantic meaning.
2.3 Line Comments (Optional Sugar)
comment ::= ";" { any_char_except_newline }
Line comments are treated as whitespace and have no semantic representation.
The canonical comment mechanism is the rem form (see §6).
3. Top-Level Grammar
document ::= ws element* ws
4. Structural Forms
4.1 Lists (Forms)
list ::= "(" ws element* ws ")"
Lists are ordered and may contain values or other lists.
4.2 Elements
element ::= value | list
5. Values
value ::= raw_string
| quoted_string
| number
| boolean
| null
| keyword
Parsing order is significant.
6. Keywords (Unquoted Strings / Atoms)
Keywords are the default textual value.
keyword ::= keyword_initial keyword_subsequent*
keyword_initial ::= letter | "_"
keyword_subsequent ::= letter
| digit
| "_"
| "."
| "/"
| ":"
| "-"
letter ::= "A"…"Z" | "a"…"z"
digit ::= "0"…"9"
Rules
- Case-sensitive
- No escaping
- Terminated by whitespace, delimiters, comments, or EOF
- Must not contain whitespace or control characters
7. Quoted Strings
Quoted strings support escape processing.
quoted_string ::= '"' quoted_char* '"'
quoted_char ::= escape_sequence
| any_char_except_quote_backslash_newline
Escapes
escape_sequence ::= "\" escape_code
escape_code ::= '"'
| "\"
| "n"
| "r"
| "t"
| "0"
| "x" hex hex
| "u" hex hex hex hex
| "U" hex hex hex hex hex hex hex hex
hex ::= digit | "A"…"F" | "a"…"f"
- Newlines are not allowed
- Escapes are processed
8. Raw Strings (Unified, Multiline)
Raw strings are the only multiline-capable string form. They support two variants:
double-bracket (Lua-style) and single-bracket (Lisp-style).
Syntax
raw_string ::= "[" raw_equals "[" raw_content "]" raw_equals "]"
| "[" raw_equals raw_content raw_equals "]"
raw_equals ::= "="*
Semantics
- Content is taken verbatim
- No escape processing
- Newlines preserved
- Closing delimiter must match opening exactly
Examples
(rem "Double bracket")
[[line one
line two]]
(rem "Single bracket")
[line one
line two]
(rem "With equal signs to allow nested brackets")
[=[ This [ ] is allowed because of the equal sign ]=]
9. Comment Forms (rem)
9.1 Principle
Comments are represented as syntactically valid data forms, not lexical trivia.
A comment is any list whose first element is the keyword:
rem
9.2 Syntax
comment_form ::= "(" "rem" element* ")"
The rem form accepts zero or more elements.
Valid examples:
(rem)
(rem note)
(rem "single-line comment")
9.3 Semantics (Normative)
- A
rem form evaluates to nothing - It MUST be ignored by consumers of the data
- It MUST NOT affect surrounding structure
- It MAY be preserved by tooling
- Its contents are opaque to the core language
Interpretation of the contents is entirely delegated to tools.
9.4 Multiline Comments
Multiline comments are expressed using raw strings inside rem:
(rem [=[
This is a multiline comment.
It may contain arbitrary text.
]=])
No separate multiline comment syntax exists.
9.5 Structured Comments
Because rem accepts arbitrary forms, it naturally supports structured metadata:
(rem
author alice
since 2026-01-01
note "autogenerated"
)
The HDF specification assigns no meaning to such structure.
9.6 Documentation Comments (Convention)
A common convention is that a rem form immediately preceding another form
documents that form:
(rem "Main nginx vhost")
(vhost
name example.com
root /var/www/example
)
This association is conventional and not enforced by the grammar.
10. Literals
10.1 Boolean
boolean ::= "true" | "false"
10.2 Null
null ::= "null"
11. Numbers
number ::= integer | float
integer ::= "-"? digit+
float ::= "-"? digit+ "." digit+ exponent?
| "-"? digit+ exponent
exponent ::= ("e" | "E") ("+" | "-")? digit+
12. Errors
Invalid constructs include:
- Unterminated lists or strings
- Mismatched raw string delimiters
- Control characters in keywords
- NUL anywhere in input
13. Minimal Examples
(rem "Cluster definition")
(cluster
(rem "Primary node")
(node pve-01 10.0.0.1)
(rem)
(node pve-02 10.0.0.2)
)
14. Intentional Omissions
- Map / key–value semantics
- Schema and validation
- Canonical formatting
- Import / include system
- Evaluation model