Tree data structure for small data sets

| updated

Goal

To store a smallish set of data with an arbitrary level of nesting, such as a config or metadata block written in YAML

Requirements

We don’t need to modify or prune any parts of the structure.

But we need to be able to construct it iteratively by parsing data from a byte stream (string or file).

For safety, it would be good to be able to limit the size of structure, in order to handle malicious/buggy input.

Once it is constructed, we want to be able to iterate through the whole thing, or extract arbitrary parts from it.

Model

An append-only tree data structure, i.e. nodes that can contain other nodes

Nodes can be:

We want to be able to model both lists (arrays) and dicts (maps).

A simple dict is a series of key-value pairs, where the values are all primitive (int/string, not list/dict). The keys are always strings. It is represented as a branch node that has only leaf nodes as children. The leaf nodes each have a name, representing the key in the dict.

A simple list contains only primitive values as elements. It is also represented as a branch node that has only leaf nodes as children, but the leaf nodes need not have names, i.e. their names are ignored.

A complex list is a series of values, some or all of which are lists/dicts. (It need not be homogeneous: some elements may be strings while others are int/list/dict.) It is represented as a branch node with children; the children may be branch or leaf nodes.

A complex dict is a series of key-value pairs, where some or all of the values are lists/dicts. (Also need not be homogeneous: the values may be of different types. The keys are still just strings, though.) It is represented as a branch node with children, each of which has a name; the children may be branch or leaf nodes.

Types

In a language without sum types (tagged unions), such as Go, we cannot use different types for the root, branch, and leaf nodes and use a sum type for the function signatures etc. It might be possible with generics though?

Instead, we can just have a single type, with fields which may be empty (zero values):

We could also add a “type” field or method (values are members of an enum), if it makes it easier to switch behaviour based on node type.

Implementation