Comparison: HDF vs. JSON and YAML
This document compares the Human Data Form (HDF) and its associated toolchain (hq, he, lutx) against the industry-standard formats JSON and YAML.
Overview Table
| Feature |
JSON |
YAML |
HDF |
| Syntax Style |
Braces & Commas |
Indentation-based |
Lisp-inspired (Parentheses) |
| Comments |
No (non-standard) |
Yes (#) |
Yes (rem forms & ;) |
| Readability |
Low (too much punctuation) |
High (clean) |
High (explicit structure) |
| Editability |
Medium |
High (but error-prone) |
High (robust structure) |
| Parsing |
Trivial / Fast |
Complex / Slow |
Simple / Deterministic |
| Standardization |
ISO Standard |
De-facto Standard |
New Specification |
1. Syntax Readability
JSON
JSON is designed primarily for machine-to-machine communication. It is overly punctuated with double-quotes, commas, and curly braces. This makes it difficult for humans to write and maintain by hand, especially for large configurations.
YAML
YAML is the gold standard for readability due to its minimal use of punctuation. However, its reliance on significant whitespace (indentation) is a frequent source of frustration and bugs ("YAML hell"). Deeply nested structures are hard to track visually.
HDF
HDF uses a Lisp-inspired syntax (key value). It strikes a balance:
- No Indentation Rules: Like JSON, HDF doesn't care about spaces, making it safer to copy-paste.
- Minimal Punctuation: It uses parentheses to define structure, eliminating the need for commas or quotes around simple keys (Keywords).
- Explicit Boundaries: The closing
) makes it clear where a block ends, avoiding the ambiguity of YAML's indentation.
2. Data Type Support
All three formats support core primitives: Strings, Numbers, Booleans, and Null.
- JSON: Very strict. Strings must be double-quoted. No support for multi-line strings without manual
escaping.
- YAML: Extremely flexible. Supports many types (too many, sometimes leading to security issues like the "Norway problem"). Excellent multi-line support (
|, >). - HDF: Streamlined.
- Keywords: Unquoted strings for keys and simple values.
- Raw Strings: Support for [[...]] or [...] allows for multiline blocks (SQL, shell scripts) without escaping.
- Uniformity: Lists are the only structural type, which simplifies processing logic.
3. Schema Validation
- JSON: Has the mature JSON Schema standard, which is widely implemented across all languages.
- YAML: Often validated using JSON Schema or custom tool-specific logic (e.g., Kubernetes, OpenAPI).
- HDF: Uses the HDF Interpreter Language (HIL). HIL is a safe, embeddable Lisp designed specifically to validate HDF data against schemas that are themselves written in HDF. This allows for self-documenting data structures.
4. Parsing Performance
- JSON: Designed for speed. Most parsers are highly optimized and can handle gigabytes per second.
- YAML: One of the most complex formats to parse correctly. It requires multiple passes and can be quite slow for large files.
- HDF: Designed for deterministic, single-pass parsing. It avoids the lookahead complexity of YAML while maintaining a simpler tokenization model than JSON. It is intended to be nearly as fast as JSON to parse.
5. Ecosystem Maturity
- JSON/YAML: Ubiquitous. Every language, editor, and cloud platform has built-in support.
- HDF: A growing ecosystem. While it lacks the massive library support of the others, it provides a unified toolchain:
- hq: For querying (jq equivalent).
- he: For editing and merging (yq equivalent).
- lutx: For advanced templating and code generation.
6. Use Case Suitability
JSON
- Best for: API responses, browser data exchange, machine-to-machine messaging.
- Worst for: Human-edited configuration files, documentation.
YAML
- Best for: CI/CD pipelines (GitHub Actions, GitLab), simple localized configs.
- Worst for: Very large nested structures, cross-platform data where indentation might be mangled.
HDF
- Best for: Infrastructure-as-Code, complex application configuration, inventory management, and generating secondary configs from a primary dataset.
- Why?: It provides the readability of YAML with the structural safety of JSON, plus a built-in query and transformation engine.
7. Data Density
Data density refers to the ratio of actual payload to structural overhead (punctuation, delimiters, whitespace). While JSON is often considered compact, HDF's support for unquoted keywords and space-delimited lists makes it even denser for most configuration tasks.
7.1 Comparison Example: Simple Record
JSON (25 characters):
{"id":101,"name":"Alice"}
HDF (20 characters):
(id 101)(name Alice)
HDF Savings: ~20%
7.2 Comparison Example: List of Items
JSON (27 characters):
["apple","banana","cherry"]
HDF (21 characters):
(apple banana cherry)
HDF Savings: ~22%
7.3 Why HDF is Denser
- Keyword Support: In JSON, every key must be a quoted string. In HDF, keys (Keywords) are unquoted.
- No Commas/Colons: HDF uses whitespace to separate elements, eliminating the overhead of
: and ,. - Uniform Delimiters: HDF uses
() for both objects and arrays, whereas JSON uses {} and []. - Implicit Types: While both support primitives, HDF's "Keyword by default" approach means simple textual values don't require quotes unless they contain spaces or special characters.
8. Comparison with Modern Alternatives
While JSON and YAML are the industry standards, several modern formats share HDF's goal of prioritizing human readability and structural robustness.
8.1 KDL (The Node-Based Alternative)
KDL is perhaps the closest spiritual relative to HDF.
- Similarities: Uses spaces for separation, explicit delimiters (braces) for hierarchy, and avoids significant whitespace (indentation).
- Difference: KDL is node-oriented (
node arg prop=val). HDF is symbolic-oriented ((key value)). HDF is more consistent (everything is a list), whereas KDL distinguishes between arguments and properties.
8.2 EDN (extensible data notation)
EDN is the data format of the Clojure ecosystem.
- Similarities: Both are Lisp-inspired and use symbolic representation.
- Difference: EDN is much more complex, supporting tagged literals, namespaces, and sets. HDF is a "Minimalist Lisp," focused on being a lean alternative to YAML for configuration rather than a full programming language data model.
8.3 HCL (HashiCorp Configuration Language)
Used by Terraform and Vault, HCL is highly successful in infrastructure.
- Similarities: Very readable and designed for DevOps.
- Difference: HCL has significantly more "syntax noise" (mixes braces, colons, and equals signs). HDF provides a cleaner, more uniform syntax that is easier to generate and parse programmatically.
8.4 Summary: Why HDF?
HDF occupies a unique niche: it provides the cleanliness of KDL and the symbolic power of EDN, but with a simpler grammar than HCL and better scriptability than any of them thanks to the integrated HDFPGL query engine and HIL interpreter.
9. Long-term Stability & Future Proofness
A critical factor for configuration and archival data is "Digital Preservation"—the ability to read and process the data 20 or 50 years from now.
9.1 The ANSI C Foundation
The HDF toolchain is implemented in Lua. Lua is famously written in pure ANSI C.
- The C Guarantee: C is the foundational language of modern computing. An ANSI C compiler will be available and functional on any hardware that exists in the future.
- No Dependency Rot: Unlike tools written in Python, Go, or Rust, which require constant updates to their compilers and runtime libraries, a Lua-based tool like
lutx or hq is virtually immune to "bit rot."
9.2 Ecosystem Comparison
| Ecosystem |
Future-Proofness |
Dependency Weight |
| Python |
Medium |
Heavy (VirtualEnvs, Pip rot, 2->3 breakage) |
| Go / Rust |
High |
Medium (Requires specific compiler versions) |
| HDF (Lua) |
Extreme |
Minimal (Single ANSI C binary, zero external libs) |
9.3 Summary
By choosing HDF, you are choosing a data stack that relies on the most stable technologies in computer science history. HDF data and its processing tools are designed to remain accessible and functional for decades without requiring a team of engineers to keep the environment running.