9 Lexical Syntax

9.1 Character Set

Case is distinguished in each of characters, strings and identifiers, so that variable-name and Variable-name are different, but where a character is used in a positional number representation (e.g. #\x3Ad) the case is ignored. Thus, case is also significant in this definition and, as will be observed later, all the special form and standard function names are lower case. In this section, and throughout this text, the names for individual character glyphs are those used in [ISO 646 : 1991].

The minimal character set to support EULISP is defined in syntax table 9.1. The language as defined in this text uses only the characters given in this table. Thus, left hand sides of the productions in this table define and name groups of characters which are used later in this definition: decimal-digit, upper-letter, lower-letter, letter, other-character and special-character. Any character not specified here is classified under other-character, which permits its use as an initial or a constituent character of an identifier (see § 9.3.0.3).

9.1.0.1 Syntax

decimal-digit: one of
0 1 2 3 4 5 6 7 8 9
upper-letter: one of
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
lower-letter: one of
a b c d e f g h i j k l m
n o p q r s t u v w x y z
letter:
upper-letter
lower-letter
normal-other-character: one of
* / < = > + .
other-character:
normal-other-character-
special-character: one of
; ’ , \ " # ( ) ‘ | @
level-0-character:
decimal-digit
letter
other-character
special-character

9.2 Whitespace and Comments

Whitespace characters are spaces, newlines, line feeds, carriage returns, character tabulations, line tabulations and form feeds. The newline character is also used to represent end of record for configurations providing such an input model, thus, a reference to newline in this definition should also be read as a reference to end of record. Whitespace separates tokens and is only significant in a string or when it occurs escaped within an identifier.

A line comment is introduced by a semicolon (;) and continues up to, but does not include, the end of the line. Hence, a line comment cannot occur in the middle of a token because of the whitespace in the form of the newline which is to whitespace. An object comment is introduced by the #; sequence optionally followed by whitespace and an object to be “commented out”.

9.2.0.2 Syntax

whitespace:
space
newline
line-feed
return
tab
vertical-tab
form-feed
comment:
; all subsequent characters
up to the end of the line
#; whitespace* object

NOTE 1 There is no notation in EULISP for block comments.

9.3 Identifiers

Identifiers in EULISP are very similar lexically to identifiers in other Lisps and in other programming languages. Informally, an identifier is a sequence of letter, decimal-digit and other-characters starting with a character that is not a decimal-digit. special-characters must be escaped if they are to be used in the names of identifiers. However, because the common notations for arithmetic operations are the glyphs for plus (+) and minus (-), which are also used to indicate the sign of a number, these glyphs are classified as identifiers in their own right as well as being part of the syntax of a number.

Sometimes, it might be desirable to incorporate characters in an identifier that are normally not legal constituents. The aim of escaping in identifiers is to change the meaning of particular characters so that they can appear where they are not otherwise acceptable. Identifiers containing characters that are not ordinarily legal constituents can be written by delimiting the sequence of characters by multiple-escape, the glyph for which is called vertical bar (|). The multiple-escape denotes the beginning of an escaped part of an identifier and the next multiple-escape denotes the end of an escaped part of an identifier. A single character that would otherwise not be a legal constituent can be written by preceding it with single-escape, the glyph for which is called reverse solidus (\). Therefore, single-escape can be used to incorporate the multiple-escape or the single-escape character in an identifier, delimited (or not) by multiple-escapes. For example, |).(| is the identifier whose name contains the three characters #\), #\. and #\(, and a|b| is the identifier whose name contains the characters #\a and #\b. The sequence || is the identifier with no name, and so is ||||, but |\|| is the identifier whose name contains the single character |, which can also be written \|, without delimiting multiple-escapes.

9.3.0.3 Syntax

identifier:
normal-identifier
peculiar-identifier
escaped-identifier
normal-identifier:
normal-initial normal-constituent*
normal-initial:
letter
normal-other-character
normal-constituent:
letter
decimal-digit
other-character
peculiar-identifier:
{+ | -}
{peculiar-constituent normal-constituent*} opt
. peculiar-constituent normal-constituent*
peculiar-constituent:
letter
other-character
escaped-identifier:
escaped-sequence escaped-sequences*
normal-initial escaped-sequences*
\level-0-character escaped-sequences*
escaped-sequences:
escaped-sequence
escaped-or-normal-constituent*
escaped-sequence:
|escaped-sequence-constituent*|
escaped-or-normal-constituent:
\level-0-character
normal-constituent
escaped-sequence-constituent:
\level-0-character
level-0-character other than |

9.4 Objects

An object is either a literal, a symbol or a list. The syntax of the classes of objects that can be read by EULISP is defined in the section of this definition corresponding to the class as defined below:
9.4.0.4 Syntax

object:
literal
list §16.12
symbol §16.17
literal:
boolean
character §16.1
float §16.7
integer §16.10
string §16.16
vector §16.19

9.5 Boolean

A boolean value is either false , which is represented by the empty list—written () and is also the value of ??—or true , which is represented by any other value than () or if specified as t:
9.5.0.5 Syntax

boolean:
true
false
true:
t
object not ()
false:
()
??

Although the class containing exactly this set of values is not defined in the language, notation is abused for convenience and boolean is defined, for the purposes of this definition, to mean that set of values.