Comparison: HDF vs. JSON and YAML

This document compares the Human Data Form (HDF) and its associated toolchain (hq, he, lutx) against the industry-standard formats JSON and YAML.

Overview Table

Feature	JSON	YAML	HDF
Syntax Style	Braces & Commas	Indentation-based	Lisp-inspired (Parentheses)
Comments	No (non-standard)	Yes (`#`)	Yes (rem forms & `;`)
Readability	Low (too much punctuation)	High (clean)	High (explicit structure)
Editability	Medium	High (but error-prone)	High (robust structure)
Parsing	Trivial / Fast	Complex / Slow	Simple / Deterministic
Standardization	ISO Standard	De-facto Standard	New Specification

1. Syntax Readability

JSON

JSON is designed primarily for machine-to-machine communication. It is overly punctuated with double-quotes, commas, and curly braces. This makes it difficult for humans to write and maintain by hand, especially for large configurations.

YAML

YAML is the gold standard for readability due to its minimal use of punctuation. However, its reliance on significant whitespace (indentation) is a frequent source of frustration and bugs ("YAML hell"). Deeply nested structures are hard to track visually.

HDF

HDF uses a Lisp-inspired syntax (key value). It strikes a balance:

No Indentation Rules: Like JSON, HDF doesn't care about spaces, making it safer to copy-paste.
Minimal Punctuation: It uses parentheses to define structure, eliminating the need for commas or quotes around simple keys (Keywords).
Explicit Boundaries: The closing ) makes it clear where a block ends, avoiding the ambiguity of YAML's indentation.

2. Data Type Support

All three formats support core primitives: Strings, Numbers, Booleans, and Null.

JSON: Very strict. Strings must be double-quoted. No support for multi-line strings without manual

escaping.

YAML: Extremely flexible. Supports many types (too many, sometimes leading to security issues like the "Norway problem"). Excellent multi-line support (|, >).
HDF: Streamlined.

- Keywords: Unquoted strings for keys and simple values. - Raw Strings: Support for [[...]] or [...] allows for multiline blocks (SQL, shell scripts) without escaping. - Uniformity: Lists are the only structural type, which simplifies processing logic.

3. Schema Validation

JSON: Has the mature JSON Schema standard, which is widely implemented across all languages.
YAML: Often validated using JSON Schema or custom tool-specific logic (e.g., Kubernetes, OpenAPI).
HDF: Uses the HDF Interpreter Language (HIL). HIL is a safe, embeddable Lisp designed specifically to validate HDF data against schemas that are themselves written in HDF. This allows for self-documenting data structures.

4. Parsing Performance

JSON: Designed for speed. Most parsers are highly optimized and can handle gigabytes per second.
YAML: One of the most complex formats to parse correctly. It requires multiple passes and can be quite slow for large files.
HDF: Designed for deterministic, single-pass parsing. It avoids the lookahead complexity of YAML while maintaining a simpler tokenization model than JSON. It is intended to be nearly as fast as JSON to parse.

5. Ecosystem Maturity

JSON/YAML: Ubiquitous. Every language, editor, and cloud platform has built-in support.
HDF: A growing ecosystem. While it lacks the massive library support of the others, it provides a unified toolchain:

- hq: For querying (jq equivalent). - he: For editing and merging (yq equivalent). - lutx: For advanced templating and code generation.

6. Use Case Suitability

JSON

Best for: API responses, browser data exchange, machine-to-machine messaging.
Worst for: Human-edited configuration files, documentation.

YAML

Best for: CI/CD pipelines (GitHub Actions, GitLab), simple localized configs.
Worst for: Very large nested structures, cross-platform data where indentation might be mangled.

HDF

Best for: Infrastructure-as-Code, complex application configuration, inventory management, and generating secondary configs from a primary dataset.
Why?: It provides the readability of YAML with the structural safety of JSON, plus a built-in query and transformation engine.

7. Data Density

Data density refers to the ratio of actual payload to structural overhead (punctuation, delimiters, whitespace). While JSON is often considered compact, HDF's support for unquoted keywords and space-delimited lists makes it even denser for most configuration tasks.

7.1 Comparison Example: Simple Record

JSON (25 characters):

{"id":101,"name":"Alice"}

HDF (20 characters):

(id 101)(name Alice)

HDF Savings: ~20%

7.2 Comparison Example: List of Items

JSON (27 characters):

["apple","banana","cherry"]

HDF (21 characters):

(apple banana cherry)

HDF Savings: ~22%

7.3 Why HDF is Denser

Keyword Support: In JSON, every key must be a quoted string. In HDF, keys (Keywords) are unquoted.
No Commas/Colons: HDF uses whitespace to separate elements, eliminating the overhead of : and ,.
Uniform Delimiters: HDF uses () for both objects and arrays, whereas JSON uses {} and [].
Implicit Types: While both support primitives, HDF's "Keyword by default" approach means simple textual values don't require quotes unless they contain spaces or special characters.

8. Comparison with Modern Alternatives

While JSON and YAML are the industry standards, several modern formats share HDF's goal of prioritizing human readability and structural robustness.

8.1 KDL (The Node-Based Alternative)

KDL is perhaps the closest spiritual relative to HDF.

Similarities: Uses spaces for separation, explicit delimiters (braces) for hierarchy, and avoids significant whitespace (indentation).
Difference: KDL is node-oriented (node arg prop=val). HDF is symbolic-oriented ((key value)). HDF is more consistent (everything is a list), whereas KDL distinguishes between arguments and properties.

8.2 EDN (extensible data notation)

EDN is the data format of the Clojure ecosystem.

Similarities: Both are Lisp-inspired and use symbolic representation.
Difference: EDN is much more complex, supporting tagged literals, namespaces, and sets. HDF is a "Minimalist Lisp," focused on being a lean alternative to YAML for configuration rather than a full programming language data model.

8.3 HCL (HashiCorp Configuration Language)

Used by Terraform and Vault, HCL is highly successful in infrastructure.

Similarities: Very readable and designed for DevOps.
Difference: HCL has significantly more "syntax noise" (mixes braces, colons, and equals signs). HDF provides a cleaner, more uniform syntax that is easier to generate and parse programmatically.

8.4 Summary: Why HDF?

HDF occupies a unique niche: it provides the cleanliness of KDL and the symbolic power of EDN, but with a simpler grammar than HCL and better scriptability than any of them thanks to the integrated HDFPGL query engine and HIL interpreter.

9. Long-term Stability & Future Proofness

A critical factor for configuration and archival data is "Digital Preservation"—the ability to read and process the data 20 or 50 years from now.

9.1 The ANSI C Foundation

The HDF toolchain is implemented in Lua. Lua is famously written in pure ANSI C.

The C Guarantee: C is the foundational language of modern computing. An ANSI C compiler will be available and functional on any hardware that exists in the future.
No Dependency Rot: Unlike tools written in Python, Go, or Rust, which require constant updates to their compilers and runtime libraries, a Lua-based tool like lutx or hq is virtually immune to "bit rot."

9.2 Ecosystem Comparison

Ecosystem	Future-Proofness	Dependency Weight
Python	Medium	Heavy (VirtualEnvs, Pip rot, 2->3 breakage)
Go / Rust	High	Medium (Requires specific compiler versions)
HDF (Lua)	Extreme	Minimal (Single ANSI C binary, zero external libs)

9.3 Summary

By choosing HDF, you are choosing a data stack that relies on the most stable technologies in computer science history. HDF data and its processing tools are designed to remain accessible and functional for decades without requiring a team of engineers to keep the environment running.