What is a Language? - Software Language Engineering

2.2 Software Language Engineering

2.2.1 What is a Language?

Originally, most software languages were textual and were described with the help of context-free grammars. In that context, ‘language’ often refers to a typically infinite set of strings and stems from the clearly defined term ‘formal language’ from language theory.

Definition 2.7 (formal language). A formal language L over an alphabet Σ is a well-defined subset of the Kleene closureΣ^∗ (the set of all possible strings overΣ).Lcan be generated by a grammar (Moll et al., 1988).

However, this notion of a language is not sufficient for MDE because of two reasons.

First, because of the advent of graphical languages (like UML) a software language and

20 Chapter 2. Foundations its (textual) representation cannot be seen as one and the same. Like in linguistics³, we have to distinguish abstract language utterances and their concrete representations. For example, no matter if a sentence (a language utterance) is spoken or written, it is still the same sentence. Second, formal languages are only about structure of language utterances, i.e., their syntax (from greek ‘sun taksis’ = ‘with arrangement’). They only describe whether a language utterance is syntactically correct but not whether it is semantically correct, i.e., whether it provides any useful meaning. Therefore, for defining software languages, Kleppe (2007) stays close to the definition of a formal language but abstracts from strings as the elements of a language and from a grammar to describe a language:

“A language L is the set of all linguistic utterances of L. (...) A language description of languageLis the set of rules according to which the linguistic utterances ofL are structured, optionally combined with a description of the intended meaning of the linguistic utterances.”

However, if language utterances are independent from their representation and their meaning (and thus, by themselves have no meaning and no tangible form), a language has to be more than only the set of all its language utterances. It has to include what its utterances mean and how they can be represented. Therefore, in MDE (e.g., by Clark et al., 2008) a language is often divided into three language aspects: (1) abstract syntax (structure of abstract language utterances), (2)concrete syntax (concrete representation of language utterances), and (3) semantics (meaning of language utterances). Our set-theoretical definition of a language (which is adapted from Sadilek, 2011) reflects this division.

Definition 2.8 (language, software language). A (software) language L is a 3-tuple L =hA,{C₁, ...,C_m},{S₁, ...,S_n}i

where

• Ais the language’s abstract syntax,

• {C₁, ...,Cm} is the non-empty set of the language’smconcrete syntaxes, and

• {S₁, ...,S_n}is the non-empty set of the language’s nsemantics.

In this interpretation of a language the abstract syntax plays the central role. A lan-guage has only one abstract syntax and it determines the lanlan-guage’s identity. Rather intuitively, a language can have more than one concrete syntax, e.g., a textual and graph-ical one. There has to be at least one concrete syntax. Less intuitively, and only because of practical reasons, we allow more than one semantics. Ideally, a language should have exactly one semantics because this way it can best fulfill its purpose: convey informa-tion unambiguously. We discuss each of these three language aspects in the following subsections.

3In theoretical linguistics, Chomsky (1965) already distinguished between thesurface structureand the deep structure of a language: “It might be supposed that surface structure and deep structure will always be identical. (...) The central idea [...] is that they are, in general, distinct [...]”.

2.2. Software Language Engineering 21 Abstract Syntax

We define the abstract syntax of a language as the set of all (syntactically correct) language utterances⁴. Thus, we stay close to the above definition of a formal language.

Definition 2.9 (abstract syntax). Theabstract syntax Aof a language Lis the set of all language utterancesu that are produced by an abstract syntax description ΘA.

In order to abstract from a grammar as a concrete means to produce the set of language utterances, we use the more general term ‘abstract syntax description’. We define what an abstract syntax description consists of in the Sect. 2.2.2. Next, we define a language utterance as an independent entity which is an element of the abstract syntax.

Definition 2.10 (language utterance). A language utterance u of a language L is an element of the abstract syntax A of L. It has an internal structure that conforms to the abstract syntax definition which produces A. Via a concrete syntax C of L and a semantics Sof La representation and a meaning is assigned to u.

Semantics

The very purpose of a language is to communicate a meaning between two parties.

Therefore, both parties have to share asemantic domain. A semantic domain is a set of meanings which is not tied to a specific language. The semantics of a language consists of a semantic domain and a mapping⁵ which maps each language utterance to an element in the semantic domain. The fact, that a semantic domain is not tied to a language and that utterances are only mapped to meaning in that domain, can be illustrated by an example.

Two people can talk in English or in German about a basketball game, e.g., referring to a foul, and could mean the exactly the same. The semantic domain of basketball neither belongs to the English language nor to the German language but exists independently.

Definition 2.11 (semantics). A semantics S of a language L = hA, ...i is a 3-tuple S = hA,D_S,M_Siconsisting of L’s abstract syntax A, a semantic domain D_S, and asemantic mapping M_S : A → D_S which is a total function that maps elements of the abstract syntax (i.e., language utterances ofL) to elements of DS.

Definition 2.12 (meaning). A meaning m of a language utterance u ∈ A of a language L= hA, ...i with a semantics S =hA,D_S,M_Si is an element of the semantic domain D_S for whichM_S(u)=m.

For software languages, there are structure-only semantics and execution semantics.

Definition 2.13 (execution semantics). An execution semantics is a semantics whose se-mantic domain is the program domain of a programmable machine, i.e., its semantic mapping maps language utterances to a valid set of instructions for that machine.

4There are different interpretations of the term ‘abstract syntax’ in the literature. Some refer to what we call internal structure of one language utterance, some to what we call the abstract syntax definition.

5With ‘mapping’ we do not necessarily mean ‘function’, i.e., we do not always imply an injective relation.

22 Chapter 2. Foundations We use the term ‘machine’ as in Hopcroft (1979), e.g., a Turing machine is a machine in that sense. It is obvious that compiled programming languages like C++ have execution semantics. A C++ compiler maps a C++ program to a set of instructions that a specific silicon processor can execute. Importantly, execution semantics can be defined indirectly by mapping language utterances to utterances of another language which provides direct execution semantics. Structure-only semantics can be seen as a less powerful kind of semantics because most execution semantics also determine structure. For example, the structure-only semantics of UML class diagrams maps a class diagram to a set of object structures (instances of that class diagram). A comparable C++ program, however, can be compiled and executed and at the same time also determines the memory layout of objects.

Concrete Syntax

Similar to the semantics, we also define the concrete syntax of a language to consist of a concrete syntax domain and a relation that maps language utterances to elements of the concrete syntax domain, i.e., their representations (Clark et al., 2008).

In contrast to the semantic mapping which maps every utterance to a meaning, the re-lation between language utterances and their representations is not necessarily a function because there can be multiple representations of an utterance. For example, two strings which only differ in whitespace can represent the same Java program. Also, languages like Scala have optional concrete syntax elements, for example, parentheses can often be omitted, so that multiple textual representations match the same Scala program.

Definition 2.14 (concrete syntax). A concrete syntax C of a language L = hA, ...i is a 3-tuple C= hA,DC,MCi consisting of L’s abstract syntaxA, a concrete syntax domain D_C, and a concrete syntax mapping M_C⊆A×D_C which is a binary relation that relates elements of the abstract syntaxA (i.e., language utterances ofL) with elements of D_C. Definition 2.15 (representation). A representation r of a language utteranceu of a lan-guage L with a concrete syntax C = hA,D_C,M_Ci is an element of the concrete syntax domain D_C for whichhu,ri ∈M_C.

Im Dokument Model transformation languages for domain-specific workbenches (Seite 29-32)