Legend:
Library
Module
Module type
Parameter
Class
Class type
Earley is a parser combinator library implemented using the Earley algorithm. It focuses mainly on efficiency and is indended to be used in conjunction with the pa_ocaml parser and syntax extention mechanism.
Types and exceptions
type'a grammar
Type of a parser (or grammar) producing a value of type 'a.
As Earley does scannerless parsing, a notion of blank function is used to discard meaningless parts of the input (e.g. comments or spaces). A blank function takes as input a buffer and a position (represented as an int) and returns a couple of a buffer and a position corresponding to the next meaningful character.
WARNING: a blank function must return a normalized pair (b,p), which means 0 <= p < Input.line_num b. You can use Input.normalize to ensure this.
The exception Parse_error(buf,pos,msgs) is raised whenever parsing fails. It contains the position pos (and the corresponding buffer buf) of the furthest reached position in the input.
val give_up : unit ->'a
give_up () can be called by the user to force the parser to reject a possible parsing rule.
val handle_exception : ?error:(unit ->'b)->('a->'b)->'a->'b
handle_exception fn v applies the function fn to v and handles the Parse_error exception. In particular, a parse error message is presented to the user in case of a failure, then error () is called. The default error is fun () -> exit 1.
char ~name c v is a grammar that accepts only the character c, and returns v as a semantic value. An optional name can be given to the grammar for reference in error messages.
val string : ?name:string ->string ->'a->'agrammar
string s v is a grammar that accepts only the string str, and returns v as a semantic value. An optional name can be given to the grammar for reference in error messages.
val keyword : ?name:string ->string ->(char -> bool)->'a->'agrammar
keyword s forbidden v is simalar to string, but the parsing fails if forbidden c returns true when c is the next available character.
eof v is a grammar that only accepts the end of file and returns v as a semantic value. Note that the end of file can be parsed one or more times (i.e. the input ends with infinitely many end of file symbols.
in_charset cs is a grammar that parses any character of the cs charset, and returns its value. An optional name can be given to the grammar for reference in error messages.
empty_pos v is similar to the above except that the action wait for the position of a complete sequence build using fsequence of sequence.
For instance, sequence_position g1 g2 f below can be defined as fsequence g1 (fsequence g2 (empty_pos f')). where f' = fun b p b' p' a2 a1 = f b p b' p' a1 a2 to give the result of g1 and g2 in the expected order.
black_box fn cs accept_empty name is a grammar that uses the function fn to parses the input buffer. fn buf pos should start parsing buf at position pos, and return a couple containing the new buffer and position of the first unread character. The character set cs must contain at least the characters that are accepted as first character by fn, and no less. The boolean accept_empty must be true if the function accept the empty string. The name argument is used for reference in error messages. Note that the functon fn should use give_up () in case of a parse error.
WARNING: fn must return a triple (x,b,p) when (b,p) is normalized, which means 0 <= p < Input.line_num b. You can use Input.normalize to ensure this.
blank_grammar gr bl produces a blank function using the grammar gr and the blank function bl. It parses as much of the input as possible using the grammar gr with the blank function bl, and returns the reached position.
val change_layout :
?old_blank_before:bool ->?new_blank_after:bool ->'agrammar->blank->'agrammar
change_layout ~old_blank_before ~new_blank_after gr bl replaces the current blank function with bl, while parsing using the grammar gr. The optional parameter old_blank_before (true by default) forces the application of the old blank function, before starting to parse with gr. Note that the new blank function is always called before the first terminal of gr. Similarly, the opt- -ional parameter new_blank_after (true by default) forces a call to the new blank function after the end of the parsing of gr. Note that the old blank function is always called after the last terminal.
declare_grammar name returns a new grammar that can be used in the definition of other grammars, but that cannot be run on input before it has been initialized with set_grammar. The name argument is used for reference to the grammar in error messages.
set_grammar gr grdef set the definiton of grammar gr (previously declared with declare_grammar) to be grdef. Invalid_argument is raised if set_grammar is used on a grammar that was not created with declare_grammar. The behavious is undefined if a grammar is set twice with set_grammar.
parse_buffer gr bl buf parses the buffer buf using the grammar gr and the blank function bl. The exception Parse_error may be raised in case of error.
val parse_string : ?filename:string ->'agrammar->blank->string ->'a
parse_string ~filename gr bl str parses the string str using the grammar gr and the blank function bl. An optional filename can be provided for reference to the input in error messages. The exception Parse_error may be raised in case of error.
parse_channel ~filename gr bl ch parses the contenst of the input channel ch using the grammar gr and the blank function bl. A filename can be provided for reference to the input in case of an error. parse_channel may raise the Parse_error exception.
partial_parse_buffer gr bl buf pos parses input from the buffer buf starting a position pos, using the grammar gr and the blank function bl. A triple is returned containing the new buffer, the position that was reached during parsing, and the semantic result of the parsing. The optional argument blank_after, true by default, indicates if the returned position if after the final blank or not. Note that this function should not be used in the defi- nition of a grammar using the black_box function.
debug_lvl is a flag that can be set for Earley to display debug data on stderr. The default value is 0, and bigger numbers acti- vate more and more debuging informations.
warn_merge is a flag that is used to choose whether warnings are displayed or not when an ambiguity is encountered while parsing. The default value is true.
keep_all_names is false by default and allow for inlining grammar with a name to optimise parsing. When debugging, it is possible to set it to true (before all grammar constructions) for more accurate messages.
sequence g1 g2 f is a grammar that first parses using g1, and then parses using g2. The results of the sequence is then obtained by applying f to the results of g1 and g2.
sequence_position g1 g2 f is a grammar that first parses using g1, and then parses using g2. The results of the sequence is then obtained by applying f to the results of g1 and g2, and to the positions (i.e. buffer and index) of the corresponding parsed input.
Remark: sequence g1 g2 f is equivalent to sequence_position g1 g2 (fun _ _ _ _ -> f).
fsequence g1 g2 is a grammar that first parses using g1, and then parses using g2. The results of the sequence is then obtained by applying the result of g1 to the result of g2.
Remark: fsequence g1 g2 is equivalent to sequence g1 g2 (fun x f -> f x).
simple_dependent_sequence g1 g2 is a grammar that first parses using g1, which returns a value a, and then continues to parse with g2 a and return its result.
dependent_sequence g1 g2 is a grammar that first parses using g1, which returns a value (a,b), and then continues to parse with g2 a and return its result applied to b. compared to the above function, allow memoizing the second grammar
apply_position f g applies function f to the value returned by the grammar g and the positions at the beginning and at the end of the input parsed input.
val position : 'agrammar->(string * int * int * int * int * 'a)grammar
position g tranforms the grammar g to add information about the position of the parsed text.
test c f perform a test f on the input buffer. Do not parse anything (position are unchanged). The charset c should contains all character accepted as at the position given to f
blank_test c f same as above except that f is applied to buf' pos' buf pos where (buf', pos') is the position before the blank. The charset c should contains all character accepted as at the position (buf,pos). This allow to test the presence of blank or even to read the blank and return some information
val grammar_family :
?param_to_string:('a-> string)->string ->('a->'bgrammar) * (('a->'bgrammar)-> unit)
grammar_family to_str name returns a pair (gs, set_gs), where gs is a finite family of grammars parametrized by a value of type 'a. A name name is to be provided for the family, and an optional function to_str can be provided to print the parameter and display better error messages.
(* Declare the grammar family *)
let (gr, set_gr) = grammar_family to_str name in
... code using grammars of gr to define mutually recursive grammars ...
... the grammars in gr cannot be used in "left position" ...
... (same restriction as for declare_grammar ...
(* Define the grammar family *)
let _ = set_gr the_grammars
... now the new family can be used ...
val grammar_prio :
?param_to_string:('b-> string)->string ->('b->'cgrammar)
* (((('b-> bool) * 'cgrammar) list * ('b->'cgrammar list))->
unit)
Similar to the previous one, with an optimization. grammar_prio to_str name returns a pair (gs, set_gs), where gs is a finite family of grammars parametrized by a value of type 'a. set_gs requires two lists of grammars to set the value of the grammar:
the first list are grammar that can only be activated by the parameter (if the given function return true)
the second list is used as for grammar family
val grammar_prio_family :
?param_to_string:(('a * 'b)-> string)->string ->('a->'b->'cgrammar)
* (('a->(('b-> bool) * 'cgrammar) list * ('b->'cgrammar list))->
unit)