Internet Engineering Task Force P. Cordell
Internet Draft Tech-Know-Ware Ltd
draft-cordell-lumas-00.txt
April 2, 2003
Expires: October 2003
Lumas -
A Language for Universal Message
Abstraction and Specification
STATUS OF THIS MEMO
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as work in progress.
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
A number of methods and tools are available for defining the format
of messages used for signalling protocols. However, many of these
methods and tools have been designed for purposes other than message
definition, and have been adopted on the basis that they are readily
available rather than being ideally suited to the task. This often
means that the methods make it difficult to get definitions correct,
or result in unnecessary verbosity both in the definition and on the
wire.
Cordell [Page 1]
Internet Draft Lumas April 2003
Lumas - Language for Universal Message Abstraction and Specification
- has been custom designed for the purpose of message definition. It
is thus easy to specify messages in a compact, extensible format that
is readily machine manipulated to produce a compact encoding on the
wire.
1. Introduction
This document defines the Lumas message definition language, and the
default text encoding method for messages defined in this way.
2. Requirements for Message Definition and Encoding
A good message definition method will have the following properties.
It is these properties that Lumas has been designed to have.
Precise Definitions
It is important to accurately capture type information in a
message definition. Some message definition methods simply
capture the name of a parameter without specifying the type of the
parameter (e.g. integer, boolean etc). Additionally types like
integers need to be constrained to appropriate values.
Lumas provides this precision of definition.
Compact Definitions
The message definition should be as compact as possible, but no
more compact. While helpful to the inexperienced developer,
excessive keywords and other formatting can actually be
detrimental to the understanding of the experienced developer as
it is harder to extract the important material from the less
important framework. Also, excessive verbosity requires increased
scrolling to read a definition which is detrimental to the
comprehension process.
Lumas adopts a compact C like definition that contains minimal
clutter and thus allows the true message structure to be readily
seen at a glance.
Readily Extensible
The message definition and the resultant on the wire encoding need
to support extensibility. As part of this, code should be able to
pass over parameters that it does not understand without becoming
confused.
The Lumas message definition and encoding allows this.
Extensible by Third Parties
Cordell [Page 2]
Internet Draft Lumas April 2003
It often occurs that a protocol is defined by one body and then
adopted and modified by another body. In other cases a base
protocol may be defined that is then augmented by external
profiles. An effective method of allowing a third-party to
accurately specify a message definition as deltas to an existing
message definition is important in this respect.
Lumas allows third-parties to specify protocol additions that
should not clash with additions made by other third parties.
Machine Parsable
It is desirable that the message definition be machine readable so
that as much of the slog involved in turning a message definition
into running code is as automated as possible. This improves time
to market and significantly reduces the potential for adding bugs
into the code.
A Lumas definition is in many respects a generalised form of C
data structure definition. Therefore it is relatively simple to
convert a machine independent Lumas definition into a machine
dependent C definition and provide all the code to convert from
one data representation to another. This process can remove a
vast amount of slog. Additionally, the various compilers involved
in the process can do a large amount of validating to ensure that
the implementation is correct.
Simplicity
While accurate message definition is important, it is perhaps even
more important that the message definition method be intelligible
to people that do not have a great deal of time to become gurus in
yet another language. Therefore the definition method should be
quick and easy to learn. This means that the message definition
language must have minimal complexity. As complexity of
definition and expressiveness are often interrelated, in some
cases it is necessary to restrict expressiveness in the interests
of simplicity. Additionally, consideration should also be given
to the complexity of the required parser, which may favour
simplicity of format over absolute message compactness.
Lumas is based on the 80-20 principle. It is a small language
that can accommodate the majority of situations extremely well.
There will be times where a Lumas representation is sub-optimal in
terms of on-the-wire compactness. However, it is felt that on the
whole, the gains in simplicity that this enables outweigh these
sub-optimalities.
Compact On-the-Wire Encoding
As a general principle, it is desirable that encoded messages be
as compact as possible. This minimises transmission bandwidth,
Cordell [Page 3]
Internet Draft Lumas April 2003
can make processing the messages more efficient, and prevents
premature fragmentation of datagrams. Compact messages are also
important in the area of mobile devices that have limited memory
and possibly transmission bandwidth. This is particularly the
case if the information is stored as persistent configuration data
rather than being immediately discarded. Also, in many cases,
compact messages are easier for developers experienced in the
protocol to read than some more verbose types, and it is these
developers that should be the primary target for any measure aimed
at easing debugging.
Given that there are limits to how compactly the actual data in a
message can be represented, the compactness of a message is
determined largely by the tagging. Existing protocols often use
no tagging of data to minimise message size. They also allow for
comma separated lists of parameters that have the same meaning
rather than requiring each parameter to be separately tagged.
Additionally descriptive parameter names are essential to a clear
message definition, but tags used in messages are often shorter
than is descriptively useful (e.g.
instead of ,
instead of ). Therefore, it is desirable to be able to
define a descriptive name that can be used in code and a tag name
that can be used on the wire. Lumas accommodates all of these
requirements.
Flexible Implementation
While turnkey solutions are desirable, they are potentially
complex to develop, and thus may incur some cost to use, thus
making them inaccessible to some. Therefore a range of
implementation routes are desirable, from minimal tools / maximum
leg work, to maximal tools/minimum leg work.
Lumas has a number of implementation routes in addition to the
compilation route. A Lumas definition can be converted into an
ABNF definition and implemented via that route, or a DOM like tree
based parsing method can be used.
Support Easy Application Debugging
Ideally the messages on the wire should be in a form that is aid
the debugging process.
By default Lumas uses a text based line format, and is thus
readily readable by human developers. Additionally it is also
easy to manually generate test messages. With the aid of cb-like
tools, it is possible to format messages so that they are more
readable than the most compact line representation. Additional
tools make it possible to automatically generate test messages and
use them as test vectors to test a parser, or validate that
manually generated test messages actually conform to the message
definition.
Cordell [Page 4]
Internet Draft Lumas April 2003
Nesting of Protocols
In some systems messages from one protocol are carried within
messages from another protocol (TCP in IP is a simple example, as
is HTML in HTTP). The definition and line encoding should allow
this.
Lumas allows this.
Flexible On-the-Wire Encoding
It is not always possible to anticipate the direction of
development so flexibility in the actual wire representation of
the messages is desirable.
The principal Lumas on-the-wire representation in text based.
However, a Lumas message definition can also be represented using
alternate text formats such as XML, and can also be represented in
binary.
2.1 That's Lumas
Lumas has been specifically designed to meet all of the above
requirements.
3. Lumas Messages Definition
This section describes how Lumas specifies the content of messages.
As the syntax is C-like it is felt that many will immediately
understand the message definition. For this reason a short example
of a message definition is presented before describing the format in
detail. The example is also used to give a rough indication of what
the formal definition describes, and will thus hopefully help with
the understanding of the latter.
3.1 Basic Principles of the Message Definition
Before presenting an example, and a more formal definition, it may be
helpful to describe the basic principles of the message definition
format.
Following the C language format, the basic format of a parameter
definition is:
type name ;
Type specifies things like integers, booleans, ASCII strings, Unicode
strings and so on.
The name is the name of the parameter.
Thus a parameter definition might be:
Cordell [Page 5]
Internet Draft Lumas April 2003
ascii rfc-name ;
This says that rfc-name is an ASCII string. In addition, a parameter
definition can express constraints on the type, constraints on the
cardinality (how many instances of the type are valid in a message),
and the tag to be used for the value on the wire. For example, an
integer may be limited to the values 0 to 255, and an ASCII string
may be limited to a maximum size. The fuller format of a parameter
has the form:
type name [cardinality] tagging ;
For example:
int <1..30000> referenced-rfcs [0..255] as refers ;
This defines an integer that can have values between 1 and 30000.
The name of the parameter is refereced-rfcs, but is tagged
on-the-wire by 'refers'. The parameter can consist of between 0 and
255 instances of the integer in a valid encoding.
Two main types of compound parameter are also possible, these being
'struct' and 'union'. Having much the same meaning as they have in
C, a struct specifies a group of parameters, all of which may be used
in a particular instance of the struct. A union similarly specifies
a group of parameters, but in this case only one of the parameters
can be used in any one instance of the union.
An example of a struct is:
struct rfc-links
{
ascii rfc-name;
int <1..30000> referenced-rfcs[0..255] as refers ;
};
A third form of compound type called 'constructed' is also available.
This allows a number of values to be concatenated together into what
looks like a single value. Hence it can be used to define constructs
like the character sequence 'HTTP/1.0', and that the '1' and the '0'
are the major and minor version numbers. This is discussed further
below.
3.2 An Example Message Definition
The following is an example message definition:
Cordell [Page 6]
Internet Draft Lumas April 2003
module com.tech-know-ware.my-example;
struct my-example
{
int <0..255> participant-id as ?;
Action action as ?;
struct my-addition[0..1] as new.tech-know-ware.com plugin
{
bool tkw-app-capable as ?;
};
};
union action
{
Join join;
Message message as msg;
void leave;
};
struct Join
{
ascii<0..63> name;
};
struct Message
{
int <0..255> to-delegates[1..127] as to;
ascii<0..255> message as msg;
[ // Version 2 additions
int <0..5> priority;
bool acknowledge as ack;
]
[ // Version 5 additions
ascii<0..16> font-name[0..1] as font;
void bold[0..1];
void italic[0..1];
void underlined[0..1] as ul;
]
};
The above definition is intended to represent a very crude meeting
controller. The first construct (my-example) is the root of all
messages for the protocol. Each message identifies a participant
using an integer in the range 0 to 255, called participant-id. When
encoded on the wire, this parameter will be untagged due to the 'as
?' specification.
Each message then has an action, which is also untagged. The type of
the action parameter is not immediately specified, and instead
references the 'Action' definition.
The Action definition is a union in which only one of the specified
Cordell [Page 7]
Internet Draft Lumas April 2003
parameters may appear in an instance of the Action construct. This
effectively represents a fork in the semantics of any given message.
The options within Action can indicate that somebody has joined the
meeting, left the meeting, or is sending a message to other
delegates.
There is no explicit tag for the 'join' and 'leave' options, so these
will be tagged on-the-wire by the parameters' names, 'join' and
'leave' respectively. Conversely, an explicit tag for the 'message'
parameter is specified, and hence the message option will be tagged
by 'msg' on-the-wire.
The join parameter also has a referenced definition. For the
purposes of this example, when a person joins a meeting, all the
other delegates are informed of their name. The name is an ASCII
string that has a minimum length of 0 characters and a maximum length
of 63 characters.
The message option is also a referenced definition. Conceptually, to
send a messages, the participant-id is used to identify the sender,
and the to-delegates field contains the participant ids of all the
people to whom the message is being sent. On-the-wire, the
to-delegates parameter will be tagged with 'to'. Between 1 and 127
(inclusive) instances of the to-delegates parameter may appear in a
message.
Also, the message itself is included. The message will consist of
ASCII characters and can be between 0 and 255 characters long.
On-the-wire, the message field will have the tag 'msg'.
The priority and acknowledge fields within the message struct have
been added in a later version of the protocol. This is indicated by
the square brackets in which the parameters are wrapped. Similarly,
font-name, and associated parameters have been added in version 5 of
the protocol (according to the comment). The reader should already
understand enough of the definition language to understand the
meaning of these fields.
Returning to the 'my-example' root, a third-party has added an
extension to the protocol in the form of the 'my-addition' parameter.
It is identified as not being part of the base specification by the
keyword 'plugin'. On-the-wire, the additional parameter will be
identified by the tag 'new.tech-know-ware.com' to differentiate it
from additions that may be made by other third parties.
On-the-wire encoded examples of this message definition are shown in
section 4.2.
3.3 Formal Message Definition Syntax
There are two types of parameter in Lumas, simple types and compound
types. The ABNF definition of these is:
Cordell [Page 8]
Internet Draft Lumas April 2003
Lumas-parameter = simple-param / compound-param
Simple types represent parameters such as integers, booleans etc.
The ABNF definition of a simple param is:
simple-param = simple-type WS name [ OWS cardinality ]
[ WS "as" WS explicit-tag ]
[ WS "plugin" ] ";"
where WS represents white space, and OWS represents optional white
space.
The 'simple-type' represents the type of the parameter. It can have
the following forms:
simple-type = "void" / "bool" / "ipv4" / "ipv6" /
"date" / "time" / "oid" /
integer-type / string-type / bytes-type /
embedded-type / const-type / reference
where:
integer-type = "int" OWS "<" int-constraint ">"
string-type = ( "ascii" / "unquoted-ascii" / "unicode" )
[ OWS "<" string-constraint ">" ]
const-type = "const" OWS "<" first-safe-char *( safe-char ) ">"
; See the section 'Notes on Comments' below
bytes-type = "bytes" [ OWS "<" length-constraint ">" ]
embedded-type = "embedded" [ OWS "<" length-constraint ">" ]
reference = [ module-name "::" ] name ; Refers to a type defined
; elsewhere
int-constraint = [ min-int-constraint ".." ]
max-int-constraint
[ OWS use-leading-zero-marker ]
min-int-constraint = ["-"] pos-number
max-int-constraint = ["-"] pos-number
use-leading-zero-marker = "z"
string-constraint = length-constraint [ OWS pattern-constraint ]
Cordell [Page 9]
Internet Draft Lumas April 2003
length-constraint = [ min-len-constraint ".." ]
max-len-constraint
min-len-constraint = pos-number
max-len-constraint = ( pos-number / "*" )
pos-number = 1*DIGIT / ; Decimal number
"0x" 1*HEX / ; Hex number
1*2DIGIT "b" ; Specifies number of binary bits
In the case of the integer-type, the mandatory constraint specifies
the minimum and maximum permissible values that the integer can take.
If the 'z' character is included in the constraint, then the integer
SHOULD be represented with leading zeros on the wire. (This is
primarily applicable to constructed types.)
The pos-number construct used to specify the integer value constraint
has a form that can specify the number of binary bits. The number of
bits specified does not include any sign bits. Hence an unsigned 32
bit number can be represented as 0..32b, whereas a signed 32 bit
number can be represented as -31b..31b (although this will actually
exclude the most negative value of a signed 32 bit number).
In the case of string-type, the optional constraint specifies the
minimum and maximum number of characters that are allowed to appear
in a valid encoding and optionally a valid pattern of characters.
The format of the pattern constraint is designed to simplify regular
expression evaluation by preventing the need for the trial and error
type processing of general regular expressions. Thus, in accordance
with Lumas' 80/20 principle, valid patterns MUST not require the
regular expression code to do backtracking.
The pattern-constraint has the following form:
Cordell [Page 10]
Internet Draft Lumas April 2003
pattern-constraint = *( constraint-char quantifier )
[ '.' quantifier ]
; '.' matches any character
constraint-char = char | character-class
char = single-char | special-char
single-char = %x20-%xff | escaped-char
escaped-char = '\\' ; Matches \
| '\[' ; Matches [
| '\?' ; Matches ?
| '\*' ; Matches *
| '\+' ; Matches +
| '\{' ; Matches {
| '\.' ; Matches .
special-char = '\r' ; Matches the return character
| '\n' ; Matches the new line character
| '\t' ; Matches the tab character
| '\f' ; Matches the form feed character
| '\s' ; Matches white space [ \t\r\n\f]
| '\d' ; Matches any digit [0-9]
| '\w' ; Matches any word character [a-zA-Z_0-9]
| '\S' ; Matches anything not matched by \s
| '\D' ; Matches anything not matched by \d
| '\W' ; Matches anything not matched by \w
character-class = matching-character-class | inverse-character-class
matching-character-class = '[' *(class-char | class-range) ']'
inverse-character-class = '[^' *(class-char | class-range) ']'
class-char = class-single-char | special-char
class-single-char = %x20-%xff | class-escaped-char
class-escaped-char =
'\-' ; Matches -
| '\]' ; Matches ]
class-range = class-single-char '-' class-single-char
quantifier = '' | '?' | '*' | '+' | '{' 1*DIGIT [ ',' [ 1*DIGIT ] ] '}'
Only a 'greedy' match is allowed.
Note there are no grouping or alternation constructs. This is to
remove the need for backtracking and is suitable for 80% (or more) of
applications. (More complex patterns can be defined in comments and
left to the application to validate.)
Example patterns include /\d{4} \d{4} \d{4} \d{4}/ for a credit card
number, or /\d{4}-\d{2}-\d{2}T\d+:\d+:\d+Z/ for a date & time
matching the form 2003-03-03T12:45:32Z.
For more information on regular expressions, see [PERL].
In the case of bytes-type, the optional constraint specifies the
minimum and maximum number of bytes that are allowed to appear in a
Cordell [Page 11]
Internet Draft Lumas April 2003
valid encoding.
In the constraint syntax, a maximum value '*' means infinite or
unbounded.
The various types have the following meaning:
void
A parameter that has no value. This is most useful in unions,
and can also be used to represent boolean events wherein the
absence of the parameter indicates false, and the presence of
the parameter indicates true. It is more useful than you might
at first think!
bool
Can be true or false
int
An integer value
ipv4
Represents an IPv4 address, but not the port.
ipv6
Represents an IPv6 address, but not the port.
date
Date according to the Gregorian calendar, with year, month and
date. Other calendar types may be constructed from primitive
types if required.
time
Represents the time in hours, minutes and seconds. By default
the time is adjusted to UTC, unless the time can be guaranteed
to have only local significance.
oid
This is an ASN.1 style Object Identifier. This is primarily
included to enable identification of security protocols.
ascii
A string made up of ASCII characters, limited at most to values
0 to 127.
Cordell [Page 12]
Internet Draft Lumas April 2003
unquoted-ascii
An ascii string usually has quote marks around it. This type
does not have quotes around it. Consequently it can not have
any white space, or include any special characters (such as
"=", "{", and "}") that would confuse the parser.
unicode
A string made up of Unicode characters.
const
This type allows a constant value to be inserted into the
encoded message. It will typically be untagged. One thing it
might be used for is identifying the protocol of the message
definition. For example:
const protocol as ?;
bytes
An array of bytes. Also useful for carriage of opaque data.
embedded
The value is an embedded Lumas message. This allows layering
of message definitions.
Referring back to the simple-param definition, the name is the name
of the parameter. If there is no explicitly defined tag, then this
is also used as the parameter's tag on-the-wire. It has the format:
name = ALPHA *( ALPHA / DIGIT / "-" / "_" )
The cardinality of a parameter specifies how many times a particular
parameter can appear in a message. The format mirrors a C-like array
specification, but uses UML style ranges rather than singular values
as are required in C. If the cardinality field is absent, then one
and only one instance of the parameter must occur in a valid message.
The format of the cardinality specification is:
cardinality = "[" [ min-occurrences ".." ] max-occurrences "]" /
"?" ; Short hand for [0..1]
"*" ; Short hand for [0..*]
"+" ; Short hand for [1..*]
min-occurrences = 1*DIGIT
Cordell [Page 13]
Internet Draft Lumas April 2003
max-occurrences = 1*DIGIT / "*"
Once again, the '*' in max-occurrences represents infinite or
unbound. Example cardinalities are as follows:
[0..1] ; Zero or one time
[0..*] ; Zero or more times
[*] ; Same as above, zero or more times
[1..*] ; One or more times
[5] ; Exactly five times
An explicit tag can be any sequence of characters that do not have
special significance to the parser. If the tag definition begins
with a "?", the "?" is discarded. Thus to specify that ? be used as
the tag on-the-wire, specify explicit-tag to be ??.
explicit-tag = tag ; tag defined in common definitions
Marking an item as plugin indicates to the developer and the tools
that this parameter is (probably) not part of the original message
definition. For example, it might be a proprietary extension. It
also indicates that the parameter may not be present in all received
messages, and impacts on the way the binary encoding operates.
The compound types are struct, union and constructed. For a struct,
subject to the various parameters cardinality specifications, any all
or none of the parameters that a struct groups together may appear in
a valid encoding of the construct. In the case of a union, only one
of the parameters may be encoded in a valid instance of the
construct. The constructed form is effectively a compact encoding of
a struct, but is subject to a number of constraints.
The format of the compound types is similar to the simple types.
They have the form:
Cordell [Page 14]
Internet Draft Lumas April 2003
compound-param = struct-param / union-param / constructed-param
struct-param = "struct" WS name [ OWS cardinality ]
[ WS "as" WS explicit-tag ]
[ WS "pluggable" ]
[ WS "plugin" ]
OWS "{" struct-body "}" OWS ";"
union-param = "union" name [ OWS cardinality ]
[ WS "as" WS explicit-tag ]
[ WS "pluggable" ]
[ WS "plugin" ]
OWS "{" union-body "}" OWS ";"
contructed-param = "constructed" name [ OWS cardinality ]
[ WS "as" WS explicit-tag ]
[ WS "plugin" ]
OWS "{" constructed-body "}" OWS ";"
In a struct and union the pluggable keyword indicates that the
construct is a location that the message designers have formally
declared as extendible using the 'plug' mechanism that is described
further below. Lumas compilers are encouraged to emit warnings when
extra material has is plugged into locations that are not marked as
pluggable, but should not consider it an error. Constructed types
are not pluggable.
The format of the struct body is:
struct-body = *( untagged-lumas-parameter )
*( lumas-parameter )
*( struct-extension )
The struct body starts with all the untagged parameters. Untagged
parameters may have a cardinality other than one. Note that, if the
cardinality of an untagged parameter allows it to be absent, then
when encoded on the wire, if the untagged parameter is absent, then
all subsequent parameters, including tagged parameters must also be
absent. Thus great care is recommended when defining a message
syntax that allows for an untagged parameter to be absent.
Following the untagged parameters, the tagged parameters are
included. When the message definition is subsequently extended,
another instance of the extension parameters construct is added for
each version in which the construct is extended. (Note that all new
parameters must always be added onto the end of an existing
construct, and the order of parameters must never be rearranged from
one version to the next.)
All of these have a similar format to the types already defined,
except that in some cases they may be untagged, or only allow a unary
cardinality. To make the ABNF definition accurate it is therefore
Cordell [Page 15]
Internet Draft Lumas April 2003
necessary to repeat the above basic definitions with the appropriate
tagging and cardinality specifications.
As mentioned, the struct body may start with
untagged-Lumas-parameters. These are untagged, and must have a
cardinality of 1. There definition is:
untagged-Lumas-parameter = untagged-simple-param /
untagged-compound-param
untagged-simple-type = simple-type WS name [ OWS cardinality ] WS
"as" WS "?" OWS ";"
untagged-compound-param = untagged-struct-param /
untagged-union-param /
untagged-constructed-param
untagged-struct-param =
"struct" WS name [ OWS cardinality ]
WS "as" WS "?"
[ WS "pluggable" ]
OWS "{" struct-body "}" OWS ";"
untagged-union-param = "union" WS name [ OWS cardinality ]
WS "as" WS "?"
[ WS "pluggable" ]
OWS "{" union-body "}" OWS ";"
untagged-constructed-param =
"constructed" WS name [ OWS cardinality ]
WS "as" WS "?"
OWS "{" constructed-body "}" OWS ";"
Note that the plugin keyword is not applicable to untagged items.
The second part of a struct definition are the items that are tagged.
These can have any desired cardinality. These have the basic
parameter definition that was initially presented, i.e.
Lumas-parameter.
The third and final part of a struct body is the extension fields.
These are parameters that are added in subsequent versions of the
protocol specification. They are marked out separately because a
parser must always consider absence of these parameters to be a valid
encoding so that it can receive messages from entities that are
working with an earlier version of the protocol. To do this would
dictate that all extension parameters would have to have a
cardinality specification that included zero. This is tedious,
potentially error prone, and loses some expressiveness. Instead,
extension parameters are wrapped inside square brackets to indicate
that they are extensions. It is then clear to any tools and
developers that these parameters may be absent if a message is
Cordell [Page 16]
Internet Draft Lumas April 2003
received from a host running an earlier version of the message
definition. The format of the struct extension is:
struct-extension = "[" 1*( Lumas-parameter ) "]"
The definition of a union-body is as follows:
union-body = [ integer-type WS name WS "as" WS "?" OWS ";" ]
*( singular-Lumas-parameter )
*( union-extension )
A union-body may have a single untagged integer parameter. All other
parameters must be tagged and have a cardinality of one and only one.
A union is extended in much the same way as a struct.
The untagged integer parameter allows integers to be defined that
have wild-carding options. For example, a union might be defined as:
union select
{
int<0..65535> numbered as ?;
void any as *;
};
Examples of the encoded form might be:
select = 12
select = *
The parameters within a union are only allowed unary cardinality to
avoid ambiguity in the line encoding. If multiple instances of a
parameter must be included as an option in a union, it is necessary
to wrap the parameters within a struct, using something similar to:
struct X { X x[1..*] as ?; };
As mentioned, most of the parameters within a union are tagged and
have a cardinality of one. Their defininition is:
Cordell [Page 17]
Internet Draft Lumas April 2003
singular-Lumas-parameter = singular-simple-param /
singular-compound-param
singular-simple-param = simple-type WS name
[ WS "as" WS explicit-tag ]
[ WS "plugin" ] OWS ";"
singular-compound-param = singular-struct-param /
singular-union-param /
singular-constructed-param
singular-struct-param = "struct" WS name [ WS "as" WS explicit-tag ]
[ WS "pluggable" ]
[ WS "plugin" ]
OWS "{" struct-body "}" OWS ";"
singular-union-param = "union" WS name [ WS "as" WS explicit-tag ]
[ WS "pluggable" ]
[ WS "plugin" ]
OWS "{" union-body "}" OWS ";"
singular-constructed-param = "constructed" WS name
[ WS "as" WS explicit-tag ]
[ WS "plugin" ]
OWS "{" constructed-body "}" OWS ";"
The union extension operates in a similar fashion to that of the
struct, but references singular-Lumas-parameters. Its definition is:
union-extension = "[" 1*( singular-Lumas-parameter ) "]"
The constructed compound type provides a simple mechanism for
defining new constructed types similar to that used for date and
time. All the members of a constructed type are encoded on the wire
using their untagged form and concatenated together with no
intervening white space. The result of the encoding MUST meet all
the constraints of an unquoted-ascii value.
Additionally, each unquoted-ascii parameter MUST have a fixed number
of characters, and the first character of the unquoted-ascii and
const parameter MUST NOT be a digit.
The form of the constructed body is:
constructed-body = *( constructed-simple-type WS name ";" )
constructed-simple-type = integer-type / const-type /
"unquoted-ascii" OWS "<" 1*DIGIT ">"
In many respects the constructed type simply makes the encoded form
look prettier, and anything that can be encoded with the constructed
Cordell [Page 18]
Internet Draft Lumas April 2003
type can also be represented with the struct type. The constructed
type should also not be used for defining patterns of ascii or
Unicode characters. Note also that a constructed type is not
pluggable and hence can not be extended. It is therefore recommended
that the constructed type be used sparingly.
An example of a constructed type is:
constructed protocol as ?
{
const const1;
int<0..99> major-version;
const <.> const2;
int<0..99> minor-version;
};
Which might be encoded as: HTTP/1.1
Constructed types also allow you to define numbers that contain
decimal points. An example of such is:
union currency as ?
{
void dollars as $;
void pounds as ú;
void francs as FFr;
}
constructed amount as ?
{
int<-31b..31b> main-denomination;
const <.> const2;
int<0..99z> sub-denomination;
};
Which might be encoded as: $ 100.05
It was mentioned previously that unions and structs could reference
types that are defined elsewhere. The format of a referenced type
can now be defined. Referenced types have a cardinality of one, and
are untagged. This is because the cardinality and tagging of the
type are defined in the item that does the referencing, rather than
where the referenced type is defined. (If a referenced type needs a
cardinality other than one, it is recommended that the technique for
giving a parameter within a union a non-unary cardinality be used.)
The definition of the referenced types are:
Cordell [Page 19]
Internet Draft Lumas April 2003
referenced-Lumas-parameter = referenced-simple-param /
referenced-compound-param
referenced-simple-param = simple-type WS name ";"
referenced-compound-param = referenced-struct-param /
referenced-union-param /
referenced-constructed-param
referenced-struct-param = "struct" WS name [ WS "pluggable" ]
OWS "{" struct-body "}" OWS ";"
referenced-union-param = "union" WS name [ WS "pluggable" ]
OWS "{" union-body "}" OWS ";"
referenced-constructed-param = "constructed" WS name
OWS "{" constructed-body "}" OWS ";"
A protocol may be extended by a third party without modifying the
original definition. This may be due to a proprietary extension, or
an externally defined profile of the base protocol. The
specification for this type of extension is:
third-party-extension = "plug" WS
tp-struct-extension /
tp-union-extension
"into" WS name *( "::" name )
*( COMMA name *( "::" name ) ) OWS ";"
tp-struct-extension = Lumas-parameter
tp-union-extension = singular-Lumas-parameter
This specifies a parameter that is to be plugged into an existing
construct. For example, if the following were defined:
plug
ascii cookie as cookie.tkwlumas.com
into my-example.my-addition;
The resulant definition would be treated as if it were:
Cordell [Page 20]
Internet Draft Lumas April 2003
struct my-example
{
int <0..255> participant-id as ?;
Action action as ?;
struct my-addition[0..1] as tech-know-ware.com plugin;
{
bool tkw-app-capable as ?;
ascii cookie as cookie.tkwlumas.com plugin;
};
};
The name field indicates that name of the construct that the item is
to be plugged into.
A single protocol may be defined in a number of message definition
files. This might be for the purpose of accessing predefined
libraries, or specifying a definition that the current definition
extends. A message definition therefore begins with a set of
optional directives expressing this information. They have the form:
Lumas-directive = OWS
[ "module" WS module-name OWS ";" OWS ]
[ "extends" WS module-name [ WS "as" WS name ] OWS ";" OWS ]
*( "import" WS module-name [ WS "as" WS name ] OWS ";" OWS )
module-name = name *( "." name )
The 'module' directive specifies the name of the module.
The 'extends' directive is used in a definition that contains a third
party extension. The module-name in the extends specification
indicates the message definition that is being extended.
The 'import' statement indicates a library message definition that
contains referenced types that are referenced within the message
definition.
The module-name follows the hierarchical format used in Java. It is
based on a domain name that is created from the name of the protocol,
combined with the domain name of the entity that defined it. For
example, if a protocol called the Simple Conference Protocol (SCP)
were defined by Tech-Know-Ware Ltd with a domain name of
tech-know-ware.com, the module name might be:
com.tech-know-ware.scp
Lumas defines a number of pseudo top level domains for its own
purposes. These are currently as follows:
+ietf A pseudo top level domain for the Internet Engineering Task
Cordell [Page 21]
Internet Draft Lumas April 2003
Force.
+iso A pseudo top level domain for the International Standards
Organisation. The sub-domains of this domain follow the
structure of ISO defined Object Identifiers. (All spaces must
be removed and numbers in brackets should be ignored when
parsing this domain. E.g. iso(1) member-body(2) us(840)
rsadsi(113549) digestAlgorithm(2) 5 is represented as
+iso(1).member-body(2).us(840).rsadsi(113549).digestAlgorithm(2).5
and looked up as +iso.member-body.us.rsadsi.digestAlgorithm.5)
+itu A pseudo top level domain for the International
Telecommunications Union. The sub-domains of this domain
follow the structure of ITU defined Object Identifiers.
Processing of such identifiers follows that defined for
processing ISO Object Identifiers.
+lumasA pseudo top level domain for defining Lumas extensions and
libraries.
+uuid A pseudo top level domain that uses Universally Unique
Identifiers for identification. An example is:
+uuid.4d36e96c-e325-11ce-bfc1-08002be10318
National standards bodies such as ANSI and BSI are defined under
their national top-level domain.
The 'name' part of the import statement is used as an alias of the
'module-name', so that items within 'module-name' can be referenced
in the abbreviated form of:
name::item
For example, if a parameter definition called 'id' is contained in
the module 'com.tech-know-ware.scp', and the following import
statement is specified:
import com.tech-know-ware.scp as scp;
Then 'id' can be referenced using:
scp::id
Finally, we are in a position to describe a complete Lumas message
definition. This is:
Lumas-definition = Lumas-directives
1* ( referenced-Lumas-parameter /
third-party-extension )
Cordell [Page 22]
Internet Draft Lumas April 2003
The first parameter defined within the message definition is the root
of the message definition tree, and is thus the outer-most construct
of an encoded message.
3.4 Complete ABNF
This section presents the complete ABNF of a message definition
without narrative. Some definitions are common with the on-the-wire
ABNF and a presented in a separate section.
Lumas-definition = Lumas-directives
1* ( referenced-Lumas-parameter /
third-party-extension )
Lumas-directive = OWS
[ "module" WS module-name OWS ";" OWS ]
[ "extends" WS module-name [ WS "as" WS name ] OWS ";" OWS ]
*( "import" WS module-name [ WS "as" WS name ] OWS ";" OWS )
module-name = name *( "." name )
referenced-Lumas-parameter = referenced-simple-param /
referenced-compound-param
referenced-simple-param = simple-type WS name ";"
simple-type = "void" / "bool" / "ipv4" / "ipv6" /
"date" / "time" / "oid" /
integer-type / string-type / bytes-type /
embedded-type / const-type / reference
integer-type = "int" OWS "<" int-constraint ">"
string-type = ( "ascii" / "unquoted-ascii" / "unicode" )
[ OWS "<" string-constraint ">" ]
bytes-type = "bytes" [ OWS "<" length-constraint ">" ]
const-type = "const" OWS "<" first-safe-char *( safe-char ) ">"
; See the section 'Notes on Comments' below
embedded-type = "embedded" [ OWS "<" length-constraint ">" ]
reference = [ module-name "::" ] name ; Refers to a type
; defined elsewhere
int-constraint = [ min-int-constraint ".." ]
max-int-constraint
[ OWS use-leading-zero-marker ]
min-int-constraint = ["-"] pos-number
max-int-constraint = ["-"] pos-number
use-leading-zero-marker = "z"
string-constraint = length-constraint [ OWS pattern-constraint ]
Cordell [Page 23]
Internet Draft Lumas April 2003
length-constraint = [ min-len-constraint ".." ]
max-len-constraint
min-len-constraint = pos-number
max-len-constraint = ( pos-number / "*" )
pos-number = 1*DIGIT / ; Decimal number
"0x" 1*HEX / ; Hex number
1*2DIGIT "b" ; Specifies number of binary bits
pattern-constraint = *( constraint-char quantifier )
[ '.' quantifier ]
; '.' matches any character
constraint-char = char | character-class
char = single-char | special-char
single-char = %x20-%xff | escaped-char
escaped-char = '\\' ; Matches \
| '\[' ; Matches [
| '\?' ; Matches ?
| '\*' ; Matches *
| '\+' ; Matches +
| '\{' ; Matches {
| '\.' ; Matches .
special-char = '\r' ; Matches the return character
| '\n' ; Matches the new line character
| '\t' ; Matches the tab character
| '\f' ; Matches the form feed character
| '\s' ; Matches white space [ \t\r\n\f]
| '\d' ; Matches any digit [0-9]
| '\w' ; Matches any word character [a-zA-Z_0-9]
| '\S' ; Matches anything not matched by \s
| '\D' ; Matches anything not matched by \d
| '\W' ; Matches anything not matched by \w
character-class = matching-character-class |
inverse-character-class
matching-character-class = '[' *(class-char | class-range) ']'
inverse-character-class = '[^' *(class-char | class-range) ']'
class-char = class-single-char | special-char
class-single-char = %x20-%xff | class-escaped-char
class-escaped-char =
'\-' ; Matches -
| '\]' ; Matches ]
class-range = class-single-char '-' class-single-char
quantifier = '' | '?' | '*' | '+' | '{' 1*DIGIT [ ',' [ 1*DIGIT ]
] '}'
name = ALPHA *( ALPHANUM / "-" / "_" )
referenced-compound-param = referenced-struct-param /
referenced-union-param /
referenced-constructed-param
Cordell [Page 24]
Internet Draft Lumas April 2003
referenced-struct-param = "struct" WS name [ WS "pluggable" ]
OWS "{" struct-body "}" OWS ";"
struct-body = *( untagged-Lumas-parameter )
*( Lumas-parameter )
*( struct-extension )
referenced-union-param = "union" WS name [ WS "pluggable" ]
OWS "{" union-body "}" OWS ";"
union-body = [ integer-type WS name WS "as" WS "?" OWS ";" ]
*( singular-Lumas-parameter )
*( union-extension )
referenced-constructed-param = "constructed" WS name
OWS "{" constructed-body "}" OWS ";"
constructed-body = *( constructed-simple-type WS name ";" )
constructed-simple-type = integer-type / const-type /
"unquoted-ascii" OWS "<" 1*DIGIT ">"
untagged-Lumas-parameter = untagged-simple-param /
untagged-compound-param
untagged-simple-type = simple-type WS name [ OWS cardinality ]
WS "as" WS "?" ";"
untagged-compound-param = untagged-struct-param /
untagged-union-param /
untagged-constructed-param
untagged-struct-param =
"struct" WS name [ OWS cardinality ]
WS "as" WS "?"
[ WS "pluggable" ]
OWS "{" struct-body "}" OWS ";"
untagged-union-param =
"union" WS name [ OWS cardinality ]
WS "as" WS "?"
[ WS "pluggable" ]
OWS "{" union-body "}" OWS ";"
untagged-constructed-param =
"constructed" WS name [ OWS cardinality ]
WS "as" WS "?"
[ WS "pluggable" ]
OWS "{" constructed-body "}" OWS
";"
Lumas-parameter = simple-param / compound-param
Cordell [Page 25]
Internet Draft Lumas April 2003
simple-param = simple-type WS name [ OWS cardinality ]
[ WS "as" WS explicit-tag ]
[ WS "plugin" ] ";"
cardinality = "[" [ min-occurrences ".." ] max-occurrences "]" /
"?" ; Short hand for [0..1]
"*" ; Short hand for [0..*]
"+" ; Short hand for [1..*]
min-occurrences = 1*DIGIT
max-occurrences = 1*DIGIT / "*"
explicit-tag = tag ; tag defined in common definitions
compound-param = struct-param / union-param /
constructed-param
struct-param = "struct" WS name [ OWS cardinality ]
[ WS "as" WS explicit-tag ]
[ WS "pluggable" ]
[ WS "plugin" ]
OWS "{" struct-body "}" OWS ";"
union-param = "union" WS name [ OWS cardinality ]
[ WS "as" WS explicit-tag ]
[ WS "pluggable" ]
[ WS "plugin" ]
OWS "{" union-body "}" OWS ";"
contructed-param = "constructed" name [ OWS cardinality ]
[ WS "as" WS explicit-tag ]
[ WS "plugin" ]
OWS "{" constructed-body "}" OWS ";"
struct-extension = "[" 1*( Lumas-parameter ) "]"
singular-Lumas-parameter = singular-simple-param /
singular-compound-param
singular-simple-param = type WS name [ WS "as" WS explicit-tag ]
[ WS "plugin" ] ";"
singular-compound-param = singular-struct-param /
singular-union-param /
singular-constructed-param
singular-struct-param = "struct" WS name
[ WS "as" WS explicit-tag ]
[ WS "pluggable" ]
[ WS "plugin" ]
"{" struct-body "}" ";"
singular-union-param = "union" WS name [ WS "as" explicit-tag ]
[ WS "pluggable" ]
[ WS "plugin" ]
Cordell [Page 26]
Internet Draft Lumas April 2003
"{" union-body "}" ";"
singular-constructed-param = "constructed" WS name
[ WS "as" WS explicit-tag ]
[ WS "plugin" ]
OWS "{" constructed-body "}" OWS ";"
third-party-extension = "plug" WS
tp-struct-extension /
tp-union-extension
"into" WS name *( "::" name )
*( "," name *( "::" name ) ) ";"
tp-struct-extension = Lumas-parameter
tp-union-extension = singular-Lumas-parameter
3.5 Locating Lumas within a Specification
It is not sufficient to use Lumas allow to define a protocol.
Additional narrative is required to define the semantics of a
protocol in addition to the syntax defined be Lumas. Thus Lumas and
narrative typically need to be combined in a separate document. The
main issue here is that at some point the Lumas must be extracted
from the specification to be useful. If the Lumas is intermingled
with the narrative, it can be manually removed using cut and paste,
however this is tedious and error-prone. An alternative is to put
all the Lumas in a separate section so that it can be easily
extracted. However, this distances the Lumas from the narrative that
explains it which is undesirable. A third option is to do both -
interleave one copy of the Lumas with the narrative and a separate
copy that can be used for compiling. This approach makes it
difficult to keep the two versions in step, and errors can easily
sneak in.
Lumas compilers MUST implement a fourth option. Before parsing a
file, a compiler should first look for a line of text on which the
first non-white space text is lumas*/ and only has white space after
it. If such a line is found, compilation starts at the following
line. Subsequent narrative is then include in /* */ comment marks.
If no such line is found, then compilation begins at the beginning of
the file.
For example, if any */ character sequences that follow this example
are removed, a Lumas compiler must be able to find and process the
following Lumas syntax:
Cordell [Page 27]
Internet Draft Lumas April 2003
lumas*/
// The first 'official' line of Lumas
struct top
{
not-much not-much;
};
*/
This is narrative.
*/
int <0..1> not-much;
/*
4. On-the-Wire Representation
4.1 Principles of On-the-Wire Encoding
The basic format of the text based on-the-wire encoding is to use the
format:
tag = value
If there are multiple instances of a parameter, then they may either
be conveyed as multiple instances of the above construct, and as a
comma separated list, as in:
tag = value, value, value
If a tag is explicitly specified in the message definition, then this
is used on the wire. If no tag is explicitly specified, then the
name of the parameter is used as the tag.
It is also possible to explicitly specify that no tag should be used
on the wire by setting the tag field to '?'. All untagged items must
appear in a struct in the same order that they are defined in the
message definition, and must appear before any tagged items within a
struct definition. Untagged parameters that have greater than one
instance must be constructed as a comma separated list. In these
cases, the format on the wire becomes:
value
or:
value, value, value
If an untagged parameter has a cardinality that allows it to be
absent from an encoded message, then all subsequent parameters in the
enclosing struct, including tagged parameters, must also be absent.
Consequently, great care should be taken when defining a message
definition that allows untagged parameters to be absent.
Thus, for the examples quoted earlier, that is:
Cordell [Page 28]
Internet Draft Lumas April 2003
ascii rfc-name ;
int <1..30000> referenced-rfcs [0..255] as refers;
The format on the wire would be something like (depending on the
actual values in question):
rfc-name = 'Lumas' refers = 822, 791, 2543
4.2 Example On-the-Wire Representation
The following are example on-the-wire representations of the example
message.
1
join = { 'Alice' }
tech-know-ware.com = { True }
1
msg = { to = 2, 5, 8, 58
msg = 'Where are we going for dinner' }
1
leave
4.3 Formal On-the-Wire Representation
The principle representation of a Lumas defined message on the wire
is text based.
Parameters may be untagged as long as they appear before any other
tagged parameters. Untagged parameters that have non-singular
cardinality must be comma separated.
The top-level construct of a Lumas definition is a referenced type,
which essentially has no tag associated with it. (Indeed, the
presence of such a tag would not convey any information.) The
top-level construct is therefore either a struct body, a union body,
or a simple value, as in:
Lumas-text-message = ( struct-body /
union-body )
A struct body can contain untagged and tagged parameters. All
untagged parameters must appear before any tagged parameters. The
definition of a struct-body is therefore:
Cordell [Page 29]
Internet Draft Lumas April 2003
struct-body = OWS
*( value *( COMMA value ) WS )
*( ( tag WS ) / ; For a void parameter
( tag EQUAL value *( COMMA value ) WS ) )
; WS not required if it's the last item
All items of a union body must be tagged, except for a single integer
parameter that may be untagged. Also, parameters must only have a
cardinality of one in the encoding to avoid ambiguities in the
encoded message. Therefore a union body has the form:
union-body = OWS (integer-value WS /
tag WS / ; For a void parameter
( tag EQUAL value WS ) )
where:
value = simple-value / compound-value
simple-value = bool-value / integer-value / oid-value /
ipv4-value / ipv6-value /
ascii-value / unquoted-ascii-value / unicode-value /
const-value / embedded-value / bytes-value /
date-value / time-value
bool-value = "True" / "False" / "T" / "F"
int-value = [ "-" ] 1*DIGIT
oid-value = 1*DIGIT *( "~" 1*DIGIT )
; Only the oid's numerical parts are represented
ipv4-value = 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT
; N.B. The IPv4 address format within an IPv6 address is not
; supported
ipv6-value = hexseq / hexseq "::" [ hexseq ] / "::" [ hexseq ]
hexseq = hex4 *( ":" hex4)
hex4 = 1*4HEXDIG
Date and time parameters have fixed width to aid parsing. As such
the various fields have leading zeros if required. (They adopt one
of the formats of ISO-8601.)
Dates are according to the Gregorian calendar. Other calendar types
may be constructed from primitive types if required.
Typically the time should be converted to UTC prior to including in a
message, unless the time can be guaranteed to have only local
significance.
Cordell [Page 30]
Internet Draft Lumas April 2003
date-value = date-year "-" date-month "-" date-day
date-year = 4DIGIT ; e.g. 2002
date-month = 2DIGIT ; With leading zeros, e.g. 02
date-day = 2DIGIT ; With leading zeros, e.g. 05
time-value = time-hours ":" time-minutes ":" time-seconds
time-hours = 2DIGIT ; With leading zeros, e.g. 02
time-minutes = 2DIGIT ; With leading zeros, e.g. 02
time-seconds = 2DIGIT ; With leading zeros, e.g. 02
; Uses 24 hour clock notation
; All times presented in UTC
ascii-value =
"'" *( %x00-26 / %x28-5B / %x2D-x7F / "\\" / "\'" ) "'"
unquoted-ascii-value = first-safe-char *( safe-char )
; See the section 'Notes on Comments' below
unicode-value = DQUOTE
*( %x00-21 / %x23-5B / %x5D-xFF / "\\" / "\" DQUOTE )
DQUOTE
; DQUOTE defined in RFC 2234
bytes-value = "^" BASE64
BASE64 = *( 4BASE64-CHAR )
(
( 4BASE64-CHAR ) /
( 3BASE64-CHAR "=" ) /
( 2BASE64-CHAR "=" "=" )
)
BASE64-CHAR = ALPHA / DIGIT / "+" / "/"
const-value = first-safe-char *( safe-char )
; See the section 'Notes on Comments' below
embedded-value = "(" *(%x00-FF) ")"
; any occurrence of '(' within embedded message must be
; matched by a corresponding ')'.
Illustrating the recursiveness of the message format, we have:
compound-value = struct-value / union-value / constructed-value
struct-value = "{" struct-body "}"
union-value = union-body
constructed-value = unquoted-ascii-value ; On the wire these
; types are equivalent
Cordell [Page 31]
Internet Draft Lumas April 2003
EQUAL = OWS "=" OWS
COMMA = OWS "," OWS
4.4 Marking Message Boundaries
Before a message is parsed it is necessary to know the boundaries of
the message. There are many ways in which this can be done, and the
method adopted should be specified in the protocol specification.
However, in the absence of any other way, Lumas parsers should take
the presence of an unmatched closing brace to be the end of message
marker. Hence, the definition of a message delimited in this way
becomes:
delimited-Lumas-text-message = Lumas-text-message ( "}" / ")" )
4.5 Illustration of Encoded Types
This section illustrates how the types look once they have been
encoded according to the syntax above. The tag of each item has the
format 'my-XXXX'. Except in the case of the 'void' example, the XXXX
part indicates the type that is encoded to the right of the equals
sign.
my-void // Tag only for a void parameter
my-bool = True
my-int = 5643
my-ipv4 = 10.0.0.1
my-ipv6 = 201:123::0
my-date = 2002-02-28
my-time = 12:00:00
my-oid = 1~2~840~113549~2~5
my-ascii = 'Lumas'
my-unquoted-ascii = Lumas
my-unicode = "Lumas"
my-const = Lumas
my-bytes = ^01AF3C==
my-embedded = ( my-other-int=5 single-closing-bracket-text=')' )
Cordell [Page 32]
Internet Draft Lumas April 2003
my-struct = { 5434 All time=98787654654 }
my-union = 5434
my-union1 = Switch
my-union2 = Volume = 11
5. Common ABNF Definitions
The following definitions are common to both the definition syntax
and the on the wire representation.
tag = [ "?" ] first-tag-safe-char *( safe-char )
first-tag-safe-char = %x21 /
; Not "
%x23-26 /
; Not ' ( )
%28-2B
; Not , -
%x2E-2F /
; Not 0 1 2 3 4 5 6 7 8 9
%x3A-3C /
; Not =
%x3E-5D
; Not ^
%x5F-7A /
; Not {
%7C /
; Not }
%7E-7F
; Visible characters except = , " ' { } ( ) ^ -
; and digits (tag must not get confused with number)
first-safe-char = first-tag-safe-char / DIGIT / "-"
safe-char = first-safe-char / DQUOTE / "'" / "{" / "(" / "^"
; Not = } ) ,
OWS = [ WS ] ; Optional white space
WS = comment / " " / HTAB / CR / LF
; HTAB, CR, LF defined in RFC-2234
Cordell [Page 33]
Internet Draft Lumas April 2003
; See section 'Notes on Comments' below
comment = c-comment / cpp-comment
c-comment = "/*" (nested-end / hard-end )
nested-end = "*/"
hard-end = "**/"
cpp-comment = "//" *( HTAB / %x20-%7f ) ( CR / LF )
; A comment is treated as a single space for the
; purposes of parsing
6. Notes on Comments
To aid development Lumas allows comments to appear in both a message
definition and a message.
On the wire, const and unquoted-ascii values MUST NOT begin with
comment start markers ('//' and '/*'). However, if the values
contain comment start marker characters, the characters are
interpreted as part of the value, and do not indicate the start of a
comment.
For example, in the first of the examples below, the text
"This-is-a-comment" MUST be treated as a comment, whereas in the
second example the text "this-is-part-of-the-value" MUST be treated
as part of the value.
ascii-value = /*This-is-a-comment*/This-is-the-value
ascii-value = and-//this-is-part-of-the-value
In a message definition (but not in a message) the c-comment style of
commenting allows nesting of comments. In a nested comment, each
'/*' character sequence MUST be matched by a corresponding '*/'
character sequence before the comment ends. Additionally, if a
comment starts with the '/*' character sequence, the end of the
comment can be forced by the hard end of comment marker defined as
'**/', which overrides the nesting. (This provision allows the
commenting out of headers and footers in text only message definition
documents.)
7. Mandatory to Understand
Many protocols require the capability to signal that certain
extension parameters be mandatory to understand, or the message
should be rejected in some way. Lumas provides no in-built mechanism
for this feature. Instead implementers are recommended to use a
feature similar to SIP's 'Require' header [SIP].
8. Security Considerations
Lumas itself does not have any security issues related to it, but the
security requirements of a protocol must be borne in mind when
writing a Lumas message definition. Common advice is that it is
Cordell [Page 34]
Internet Draft Lumas April 2003
difficult to add security to a protocol once it has been released,
and hence security issues must be considered from the outset. This
is of issue to a Lumas message definition as it may affect the format
of messages. This is particularly the case for integrity check
values that are effectively appended to the message once the message
is encoded.
9. References
[ABNF]D. Crocker, & P. Overell, "Augmented BNF for Syntax
Specifications: ABNF, " Internet Engineering Task Force, RFC
2234, November 1997.
[XML] "Extensible Markup Language (XML) 1.0 (Second Edition)", W3C
REC-xml, October 2000.
[PERL]L. Wall, T.Christiansen, & J. Orwant, "Programming Perl",
O'Reilly, ISDN-0-596-00027-8.
[SIP] J. Rosenberg et al., "SIP: Session Initiation Protocol,"
Internet Engineering Task Force, RFC 3261, June 2002.
10. Author's Address
Pete Cordell
Tech-Know-Ware Ltd
P.O. Box 30
Ipswich,
IP5 2WY
UK
pete@tech-know-ware.com
Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
Cordell [Page 35]
Internet Draft Lumas April 2003
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Cordell [Page 36]