9 Lexical Syntax
9.1 Character Set
Case is distinguished in each of characters, strings and identifiers,
so that variable-name and Variable-name are different, but where
a character is used in a positional number representation
(e.g. #\x3Ad) the case is ignored. Thus, case is also significant in
this definition and, as will be observed later, all the special form
and standard function names are lower case. In this section, and
throughout this text, the names for individual character glyphs are
those used in [ISO 646 : 1991].
The minimal character set to support EULISP is defined in syntax
table 9.1. The language as defined in this text uses only the
characters given in this table. Thus, left hand sides of the
productions in this table define and name groups of characters
which are used later in this definition: decimal-digit, upper-letter,
lower-letter, letter, other-character and special-character. Any
character not specified here is classified under other-character,
which permits its use as an initial or a constituent character of an
identifier (see § 9.3.0.3).
9.1.0.1 Syntax
| A B C D E F G H I J K L M |
| N O P Q R S T U V W X Y Z |
| a b c d e f g h i j k l m |
| n o p q r s t u v w x y z |
normal-other-character: one of |
special-character: one of |
9.2 Whitespace and Comments
Whitespace characters are spaces, newlines, line feeds, carriage
returns, character tabulations, line tabulations and form feeds. The
newline character is also used to represent end of record for
configurations providing such an input model, thus, a reference to
newline in this definition should also be read as a reference
to end of record. Whitespace separates tokens and is only
significant in a string or when it occurs escaped within an
identifier.
A line comment is introduced by a semicolon (;) and continues up
to, but does not include, the end of the line. Hence, a line comment
cannot occur in the middle of a token because of the whitespace in
the form of the newline which is to whitespace. An object comment
is introduced by the #; sequence optionally followed by whitespace
and an object to be “commented out”.
9.2.0.2 Syntax
| ; all subsequent characters |
| | up to the end of the line |
NOTE 1 There is no notation in EULISP for block comments.
9.3 Identifiers
Identifiers in EULISP are very similar lexically to identifiers in
other Lisps and in other programming languages. Informally, an
identifier is a sequence of letter, decimal-digit and other-characters
starting with a character that is not a decimal-digit. special-characters
must be escaped if they are to be used in the names of identifiers.
However, because the common notations for arithmetic operations
are the glyphs for plus (+) and minus (-), which are also used to
indicate the sign of a number, these glyphs are classified as
identifiers in their own right as well as being part of the syntax of
a number.
Sometimes, it might be desirable to incorporate characters in an
identifier that are normally not legal constituents. The aim of
escaping in identifiers is to change the meaning of particular
characters so that they can appear where they are not otherwise
acceptable. Identifiers containing characters that are not
ordinarily legal constituents can be written by delimiting the
sequence of characters by multiple-escape, the glyph for which
is called vertical bar (|). The multiple-escape denotes the
beginning of an escaped part of an identifier and the next
multiple-escape denotes the end of an escaped part of an
identifier. A single character that would otherwise not be a legal
constituent can be written by preceding it with single-escape,
the glyph for which is called reverse solidus (\). Therefore,
single-escape can be used to incorporate the multiple-escape or the
single-escape character in an identifier, delimited (or not) by
multiple-escapes. For example, |).(| is the identifier whose name
contains the three characters #\), #\. and #\(, and a|b| is the
identifier whose name contains the characters #\a and #\b. The
sequence || is the identifier with no name, and so is ||||,
but |\|| is the identifier whose name contains the single
character |, which can also be written \|, without delimiting
multiple-escapes.
9.3.0.3 Syntax
escaped-or-normal-constituent: |
escaped-sequence-constituent: |
9.4 Objects
An object is either a literal, a symbol or a list. The syntax of the
classes of objects that can be read by EULISP is defined in the
section of this definition corresponding to the class as defined
below:
9.4.0.4 Syntax
9.5 Boolean
A boolean value is either false , which is represented
by the empty list—written () and is also the value of ??—or true ,
which is represented by any other value than () or if specified as
t:
9.5.0.5 Syntax
Although the class containing exactly this set of values is not
defined in the language, notation is abused for convenience and
boolean is defined, for the purposes of this definition, to mean that
set of values.