specification,
just a working interpreter or compiler. Ruby itself falls intothis “specification by implementation” category: although
there are plenty of books and tutorials about how Ruby is supposed to
work, the ultimate source of all this information isMatz’s Ruby Interpreter (MRI), the language’s reference
implementation. If any piece of Ruby documentation disagrees with the
actual behavior of MRI, it’s the documentation that’s wrong; third-party
Ruby implementations like JRuby, Rubinius, and MacRuby have to work hard
to imitate the exact behavior of MRI so that they can usefully claim to be
compatible with the Ruby language. Other languages like PHP and Perl 5
share this implementation-led approach to language definition.
Another way of describing a programming language is to write an
official prose specification, usually in English. C++, Java, and
ECMAScript (the standardized version of JavaScript) are examples of this
approach: the languages are standardized in implementation-agnostic
documents written by expert committees, and many compatible
implementations of those standards exist. Specifying a language with an
official document is more rigorous than relying on a reference
implementation—design decisions are more likely to be the result of
deliberate, rational choices, rather than accidental consequences of a
particular implementation—but the specifications are often quite difficult
to read, and it can be very hard to tell whether they contain any
contradictions, omissions, or ambiguities. In particular there’s no formal
way to reason about an English-language specification; we just have to
read it thoroughly, think about it a lot, and hope we’ve understood all
theconsequences.
Note
A prose specification of Ruby 1.8.7 does exist, and has even been
accepted as an ISO standard (ISO/IEC 30170). [ 3 ] MRI is still regarded as the canonical
specification-by-implementation of the Ruby language, although the mruby project is an
attempt to build a lightweight, embeddable Ruby implementation that
explicitly aims for compliance with the ISO standard rather than MRI
compatibility.
A third alternative is to use the mathematical techniques of formal
semantics to precisely describe the meaning of a programming language. The
goal here is to be completely unambiguous, as well as to write the
specification in a form that’s suited to methodical analysis, or even automated analysis, so that it can be comprehensively
checked for consistency, contradiction, or oversight. We’ll look at these
formal approaches to semantic specification after we’ve seen how syntax is
handled.
Syntax
A conventionalcomputer program is a long string of characters. Every
programming language comes with a collection of rules that describe what
kind of character strings may be considered valid programs in that
language; these rules specify the language’s
syntax
.
A language’s syntax rules allow us to distinguish potentially valid programs like
y = x + 1
from nonsensical ones like
>/;x:1@4
. They also provide useful information about how to read ambiguous
programs: rules about operator precedence, for example, can automatically determine that
1 + 2 * 3
should be treated as though it had been written
as
1 + (2 * 3)
, not as
(1 + 2) *
3
.
The intended use of a computer program is, of course, to be read by
a computer, and reading programs requires a
parser
: a program that can read a
character string representing a program, check it against the syntax rules
to make sure it’s valid, and turn it into a structured representation of
that program suitable for further processing.
There are a variety of tools that can automatically turn a language’s syntaxrules into a parser. The details of how these rules are specified, and the
techniques for turning them into usable parsers, are not the focus of this chapter—see Implementing Parsers for a quick overview—but overall, a