package data-encoding

  1. Overview
  2. Docs

Type-safe serialization and deserialization of data structures.

Data Encoding

Overview

This module provides type-safe serialization and deserialization of data structures. Backends are provided to both /ad hoc/ binary, JSON and BSON.

This works by writing type descriptors by hand, using the provided combinators. These combinators can fine-tune the binary representation to be compact and efficient, but also provide proper field names and meta information. As a result, an API that uses those descriptors can be automatically introspected and documented.

Here is an example encoding for type (int * string).

let enc = obj2 (req "code" uint16) (req "message" string)

In JSON, this encoding maps values of type int * string to JSON objects with a field code whose value is a number and a field message whose value is a string.

In binary, this encoding maps to two raw bytes for the int followed by the size of the string in bytes, and finally the raw contents of the string. This binary format is mostly tagless, meaning that serialized data cannot be interpreted without the encoding that was used for serialization.

Regarding binary serialization, encodings are classified as either:

  • fixed size (booleans, integers, numbers) data is always the same size for that type ;
  • dynamically sized (arbitrary strings and bytes) data is of unknown size and requires an explicit length field ;
  • variable size (special case of strings, bytes, and arrays) data makes up the remainder of an object of known size, thus its size is given by the context, and does not have to be serialized.

JSON operations are delegated to json-data-encoding.

Module structure

This Data_encoding module provides multiple submodules:

  • Encoding contains the necessary types and constructors for making the type descriptors.
  • Json, Bson, and Binary contain functions to serialize and deserialize values.
module Encoding : sig ... end
include module type of Encoding with type 'a t = 'a Encoding.t
type 'a t = 'a Encoding.t

The type descriptors for values of type 'a.

type 'a encoding = 'a t

Ground descriptors

val null : unit encoding

Special value null in JSON, nothing in binary.

val empty : unit encoding

Empty object (not included in binary, encoded as empty object in JSON).

val unit : unit encoding

Unit value, omitted in binary. Serialized as an empty object in JSON, accepts any object when deserializing.

val constant : string -> unit encoding

Constant string (data is not included in the binary data).

val int8 : int encoding

Signed 8 bit integer (data is encoded as a byte in binary and an integer in JSON).

val uint8 : int encoding

Unsigned 8 bit integer (data is encoded as a byte in binary and an integer in JSON).

val int16 : int encoding

Signed 16 bit integer (data is encoded as a short in binary and an integer in JSON).

val uint16 : int encoding

Unsigned 16 bit integer (data is encoded as a short in binary and an integer in JSON).

val int31 : int encoding

Signed 31 bit integer, which corresponds to type int on 32-bit OCaml systems (data is encoded as a 32 bit int in binary and an integer in JSON).

val int32 : int32 encoding

Signed 32 bit integer (data is encoded as a 32-bit int in binary and an integer in JSON).

val int64 : int64 encoding

Signed 64 bit integer (data is encoded as a 64-bit int in binary and a decimal string in JSON).

val ranged_int : int -> int -> int encoding

Integer with bounds in a given range. Both bounds are inclusive.

  • raises Invalid_argument

    if the bounds are beyond the interval -2^30; 2^30-1. These bounds are chosen to be compatible with all versions of OCaml.

val z : Z.t encoding

Big number In JSON, data is encoded as a decimal string. In binary, data is encoded as a variable length sequence of bytes, with a running unary size bit: the most significant bit of each byte tells is this is the last byte in the sequence (0) or if there is more to read (1). The second most significant bit of the first byte is reserved for the sign (positive if zero). Binary_size and sign bits ignored, data is then the binary representation of the absolute value of the number in little-endian order.

val n : Z.t encoding

Positive big number, see z.

val float : float encoding

Encoding of floating point number (encoded as a floating point number in JSON and a double in binary).

val ranged_float : float -> float -> float encoding

Float with bounds in a given range. Both bounds are inclusive

val bool : bool encoding

Encoding of a boolean (data is encoded as a byte in binary and a boolean in JSON).

val string : string encoding

Encoding of a string

  • encoded as a byte sequence in binary prefixed by the length of the string
  • encoded as a string in JSON.
val bytes : Stdlib.Bytes.t encoding

Encoding of arbitrary bytes (encoded via hex in JSON and directly as a sequence byte in binary).

Descriptor combinators

val option : 'a encoding -> 'a option encoding

Combinator to make an optional value (represented as a 1-byte tag followed by the data (or nothing) in binary and either the raw value or an empty object in JSON).

val result : 'a encoding -> 'b encoding -> ('a, 'b) Stdlib.result encoding

Combinator to make a result value (represented as a 1-byte tag followed by the data of either type in binary, and either unwrapped value in JSON (the caller must ensure that both encodings do not collide)).

val array : ?max_length:int -> 'a encoding -> 'a array encoding

Array combinator.

  • encoded as an array in JSON
  • encoded as the concatenation of all the element in binary prefixed its length in bytes

If max_length is passed and the encoding of elements has fixed size, a check_size is automatically added for earlier rejection.

  • raises Invalid_argument

    if the inner encoding is variable.

val list : ?max_length:int -> 'a encoding -> 'a list encoding

List combinator.

  • encoded as an array in JSON
  • encoded as the concatenation of all the element in binary prefixed its length in bytes

If max_length is passed and the encoding of elements has fixed size, a check_size is automatically added for earlier rejection.

  • raises Invalid_argument

    if the inner encoding is also variable.

val conv : ('a -> 'b) -> ('b -> 'a) -> ?schema:Json_schema.schema -> 'b encoding -> 'a encoding

Provide a transformer from one encoding to a different one.

Used to simplify nested encodings or to change the generic tuples built by obj1, tup1 and the like into proper records.

A schema may optionally be provided as documentation of the new encoding.

val assoc : 'a encoding -> (string * 'a) list encoding

Association list. An object in JSON, a list of pairs in binary.

Product descriptors

type 'a field

An enriched encoding to represent a component in a structured type, augmenting the encoding with a name and whether it is a required or optional. Fields are used to encode OCaml tuples as objects in JSON, and as sequences in binary, using combinator obj1 and the like.

val req : ?title:string -> ?description:string -> string -> 't encoding -> 't field

Required field.

val opt : ?title:string -> ?description:string -> string -> 't encoding -> 't option field

Optional field. Omitted entirely in JSON encoding if None. Omitted in binary if the only optional field in a `Variable encoding, otherwise a 1-byte prefix (`0` or `255`) tells if the field is present or not.

val varopt : ?title:string -> ?description:string -> string -> 't encoding -> 't option field

Optional field of variable length. Only one can be present in a given object.

val dft : ?title:string -> ?description:string -> string -> 't encoding -> 't -> 't field

Required field with a default value. If the default value is passed, the field is omitted in JSON. The value is always serialized in binary.

Constructors for objects with N fields

These are serialized to binary by converting each internal object to binary and placing them in the order of the original object. These are serialized to JSON as a JSON object with the field names. An object might only contains one 'variable' field, typically the last one. If the encoding of more than one field are 'variable', the first ones should be wrapped with dynamic_size.

  • raises Invalid_argument

    if more than one field is a variable one.

val obj1 : 'f1 field -> 'f1 encoding
val obj2 : 'f1 field -> 'f2 field -> ('f1 * 'f2) encoding
val obj3 : 'f1 field -> 'f2 field -> 'f3 field -> ('f1 * 'f2 * 'f3) encoding
val obj4 : 'f1 field -> 'f2 field -> 'f3 field -> 'f4 field -> ('f1 * 'f2 * 'f3 * 'f4) encoding
val obj5 : 'f1 field -> 'f2 field -> 'f3 field -> 'f4 field -> 'f5 field -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5) encoding
val obj6 : 'f1 field -> 'f2 field -> 'f3 field -> 'f4 field -> 'f5 field -> 'f6 field -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6) encoding
val obj7 : 'f1 field -> 'f2 field -> 'f3 field -> 'f4 field -> 'f5 field -> 'f6 field -> 'f7 field -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7) encoding
val obj8 : 'f1 field -> 'f2 field -> 'f3 field -> 'f4 field -> 'f5 field -> 'f6 field -> 'f7 field -> 'f8 field -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7 * 'f8) encoding
val obj9 : 'f1 field -> 'f2 field -> 'f3 field -> 'f4 field -> 'f5 field -> 'f6 field -> 'f7 field -> 'f8 field -> 'f9 field -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7 * 'f8 * 'f9) encoding
val obj10 : 'f1 field -> 'f2 field -> 'f3 field -> 'f4 field -> 'f5 field -> 'f6 field -> 'f7 field -> 'f8 field -> 'f9 field -> 'f10 field -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7 * 'f8 * 'f9 * 'f10) encoding
val merge_objs : 'o1 encoding -> 'o2 encoding -> ('o1 * 'o2) encoding

Create a larger object from the encodings of two smaller ones.

  • raises Invalid_argument

    if both arguments are not objects or if both tuples contains a variable field..

Constructors for tuples with N fields

These are serialized to binary by converting each internal object to binary and placing them in the order of the original object. These are serialized to JSON as JSON arrays/lists. Like objects, a tuple might only contains one 'variable' field, typically the last one. If the encoding of more than one field are 'variable', the first ones should be wrapped with dynamic_size.

  • raises Invalid_argument

    if more than one field is a variable one.

val tup1 : 'f1 encoding -> 'f1 encoding
val tup2 : 'f1 encoding -> 'f2 encoding -> ('f1 * 'f2) encoding
val tup3 : 'f1 encoding -> 'f2 encoding -> 'f3 encoding -> ('f1 * 'f2 * 'f3) encoding
val tup4 : 'f1 encoding -> 'f2 encoding -> 'f3 encoding -> 'f4 encoding -> ('f1 * 'f2 * 'f3 * 'f4) encoding
val tup5 : 'f1 encoding -> 'f2 encoding -> 'f3 encoding -> 'f4 encoding -> 'f5 encoding -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5) encoding
val tup6 : 'f1 encoding -> 'f2 encoding -> 'f3 encoding -> 'f4 encoding -> 'f5 encoding -> 'f6 encoding -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6) encoding
val tup7 : 'f1 encoding -> 'f2 encoding -> 'f3 encoding -> 'f4 encoding -> 'f5 encoding -> 'f6 encoding -> 'f7 encoding -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7) encoding
val tup8 : 'f1 encoding -> 'f2 encoding -> 'f3 encoding -> 'f4 encoding -> 'f5 encoding -> 'f6 encoding -> 'f7 encoding -> 'f8 encoding -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7 * 'f8) encoding
val tup9 : 'f1 encoding -> 'f2 encoding -> 'f3 encoding -> 'f4 encoding -> 'f5 encoding -> 'f6 encoding -> 'f7 encoding -> 'f8 encoding -> 'f9 encoding -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7 * 'f8 * 'f9) encoding
val tup10 : 'f1 encoding -> 'f2 encoding -> 'f3 encoding -> 'f4 encoding -> 'f5 encoding -> 'f6 encoding -> 'f7 encoding -> 'f8 encoding -> 'f9 encoding -> 'f10 encoding -> ('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7 * 'f8 * 'f9 * 'f10) encoding
val merge_tups : 'a1 encoding -> 'a2 encoding -> ('a1 * 'a2) encoding

Create a large tuple encoding from two smaller ones.

  • raises Invalid_argument

    if both values are not tuples or if both tuples contains a variable field.

Sum descriptors

type 't case

A partial encoding to represent a case in a variant type. Hides the (existentially bound) type of the parameter to the specific case, providing its encoder, and converter functions to and from the union type.

type case_tag =
  1. | Tag of int
  2. | Json_only
val case : title:string -> ?description:string -> case_tag -> 'a encoding -> ('t -> 'a option) -> ('a -> 't) -> 't case

Encodes a variant constructor. Takes the encoding for the specific parameters, a recognizer function that will extract the parameters in case the expected case of the variant is being serialized, and a constructor function for deserialization.

The tag must be less than the tag size of the union in which you use the case. An optional tag gives a name to a case and should be used to maintain compatibility.

An optional name for the case can be provided, which is used in the binary documentation.

val union : ?tag_size:[ `Uint8 | `Uint16 ] -> 't case list -> 't encoding

Create a single encoding from a series of cases.

In JSON, all cases are tried one after the other. The caller must check for collisions.

In binary, a prefix tag is added to discriminate quickly between cases. The default is `Uint8 and you must use a `Uint16 if you are going to have more than 256 cases.

  • raises Invalid_argument

    if it is given the empty list or if there are more cases than can fit in the tag size.

Predicates over descriptors

val is_obj : 'a encoding -> bool

Is the given encoding serialized as a JSON object?

val is_tup : 'a encoding -> bool

Does the given encoding encode a tuple?

val classify : 'a encoding -> [ `Fixed of int | `Dynamic | `Variable ]

Classify the binary serialization of an encoding as explained in the preamble.

Specialized descriptors

val string_enum : (string * 'a) list -> 'a encoding

Encode enumeration via association list

  • represented as a string in JSON and
  • represented as an integer representing the element's position in the list in binary. The integer size depends on the list size.
module Fixed : sig ... end

Create encodings that produce data of a fixed length when binary encoded. See the preamble for an explanation.

module Variable : sig ... end

Create encodings that produce data of a variable length when binary encoded. See the preamble for an explanation.

module Bounded : sig ... end
val dynamic_size : ?kind:[ `Uint30 | `Uint16 | `Uint8 ] -> 'a encoding -> 'a encoding

Mark an encoding as being of dynamic size. Forces the size to be stored alongside content when needed. Typically used to combine two variable encodings in a same objects or tuple, or to use a variable encoding in an array or a list.

val check_size : int -> 'a encoding -> 'a encoding

check_size size encoding ensures that the binary encoding of a value will not be allowed to exceed size bytes. The reader and the writer fails otherwise. This function do not modify the JSON encoding.

val delayed : (unit -> 'a encoding) -> 'a encoding

Recompute the encoding definition each time it is used. Useful for dynamically updating the encoding of values of an extensible type via a global reference (e.g., exceptions).

val splitted : json:'a encoding -> binary:'a encoding -> 'a encoding

Define different encodings for JSON and binary serialization.

val mu : string -> ?title:string -> ?description:string -> ('a encoding -> 'a encoding) -> 'a encoding

Combinator for recursive encodings.

Documenting descriptors

val def : string -> ?title:string -> ?description:string -> 't encoding -> 't encoding

Give a name to an encoding and optionally add documentation to an encoding.

type 'a lazy_t

See lazy_encoding below.

val lazy_encoding : 'a encoding -> 'a lazy_t encoding

Combinator to have a part of the binary encoding lazily deserialized. This is transparent on the JSON side.

val force_decode : 'a lazy_t -> 'a option

Force the decoding (memoized for later calls), and return the value if successful.

val force_bytes : 'a lazy_t -> Stdlib.Bytes.t

Obtain the bytes without actually deserializing. Will serialize and memoize the result if the value is not the result of a lazy deserialization.

val make_lazy : 'a encoding -> 'a -> 'a lazy_t

Make a lazy value from an immediate one.

val apply_lazy : fun_value:('a -> 'b) -> fun_bytes:(Stdlib.Bytes.t -> 'b) -> fun_combine:('b -> 'b -> 'b) -> 'a lazy_t -> 'b

Apply on structure of lazy value, and combine results

Create a Data_encoding.t value which records knowledge of older versions of a given encoding as long as one can "upgrade" from an older version to the next (if upgrade is impossible one should consider that the encoding is completely different).

See the module Documented_example in "./test/versioned.ml" for a tutorial.

module With_version : sig ... end
module Json : sig ... end
module Bson : sig ... end
module Binary_schema : sig ... end
module Binary : sig ... end
type json = Json.t
val json : json Encoding.t
type json_schema = Json.schema
val json_schema : json_schema Encoding.t
type bson = Bson.t
module Registration : sig ... end
OCaml

Innovation. Community. Security.