F ORMAL D ESCRIPTION T ECHNIQUES - UML Profile for Communicating Systems

Models are expressed by a specific language based on a formal semantics and syntax. This language can be of a graphical or a textual representation. Figure 7 outlines the meaning of a model. There may be several interrelated models defined, but with different levels of abstraction, e.g. platform independent (PIM) and platform specific models (PSM). A model is expressed by means of a specific language. This language defines a concrete syntax acting as the intelligible, human-understandable notation. This concrete syntax is an instantiation of the abstract syntax. The abstract syntax abstracts away from specific details of the concrete syntax, as the concrete syntax is not essential for the definition or specification of expressions such as blank spaces, comments and delimiters. It represents the deep structure in terms of an abstract syntax tree. The dynamic semantics defines the run-time behavior of a model. As shown, this also implies additional constraints on the model which are summarized in the static semantics. Static semantics constrain the model statements at compile-time.

The static semantics is implied by the dynamic syntax. This encompasses visibility rules, matching function signatures, compatible data types and so on.

3.3 Formal Description Techniques

Today, network services and communication protocols are specified informally in many cases.

Prominent examples are the Internet standards (so-called Request for Comments) issued by the Internet Engineering Task Force (IETF). They are standard documents, composed of plain text in ordinary English language with additions of some textual character-based graphics to show network configurations and message exchange. Such a specification is merely composed of scattered descriptions of partial protocol runs of the entities. In most cases, this describes the behavior of the entities after some event has occurred. Then, the protocol developer has to reassemble the complete behavior of an entity from the single description parts. Nevertheless, informal specifications have the advantage that they can be easily accessed and understood. The disadvantage is that informal specifications tend to become ambiguous, imprecise and incomplete. Special occurrences or conditions are simply not considered. Multiple features interact in a way that was not anticipated or expected beforehand. Hence, different protocol developers freely interpret the specification at whim and consequently deviate in their final implementation. This results in incorrect and incompatible protocol implementations. Furthermore, computer-aided validation and testing of the development process can hardly rely on informal specifications.

The necessity for a formal description of communication protocols is commonly undisputed and accepted. A formal description is a description by means of a description method with a strong formal semantics which ensures an unambiguous interpretation of the description. The following Figure 8, taken from [Koe03], depicts this process.

Figure 8: Interpretation of Formal Descriptions Formal Description

Semantic Model

Interpreted Description

Successive Interpretation

The description of communication services and protocols is composed by the description of a communication behavior and the description of the protocol data formats for the service primitives and protocol data units. For data formats, several techniques for data descriptions have been developed. A brief overview is given in Section 7.7. The description techniques for communication protocols can be categorized in constructive and descriptive methods [Got92].

Constructive methods describe a communication protocol behavior by means of an abstract model.

This model is executed by a virtual machine and it shows the behavior of the communicating entities.

This description can be considered as near-to-implementation as the protocol is described by means of a more abstract, higher-leveled protocol description. This results in direct support for designing, validating and implementing the communication protocol. Derived from the specification and description, executable prototypes can be generated automatically. However, specific requirements for a communication protocol like liveliness cannot be described explicitly and have to be verified by the chosen description model.

Descriptive methods do not describe the behavior of a communication protocol, but they describe properties that have to be fulfilled by the implementation of logical expressions. The main advantage is that properties of the protocol can be described without any implementation details. Therefore, one can validate them independently.

3.3.1 Formal Languages

As discussed in the previous sections, formal languages can play an important role in the software development process and in the development of communication protocols. The nature of formal languages is to have an exact definition of the expressions that belong to the language, the syntax of the language, and what these expressions have for an exact meaning, the semantics of the language.

Furthermore, the pragmatics of a language is important. For programming languages, pragmatics includes issues such as ease of implementation and programming methodology [SK95].

A specification is formal if its meaning (semantics) is unambiguous. Formal Description Techniques (FDT) are languages that are distinguished from formal languages by having a formal syntax and also a formal semantics. This makes a difference to formal languages such as Java or C++ which only have a formal syntax. They are rather implementation languages than description languages. It is generally accepted that it is essential to produce a thorough and exhaustive system specification and design for the successful development of a system. Specification languages can help to accomplish such task if they are capable of satisfying the following needs [Abs93, Ver01b]:

• unambiguous, clear, precise and concise specifications,

• a thorough and accurate basis for analyzing specifications,

• a basis for determining whether an implementation is conformant to the specifications or not,

• a basis for determining the consistency of specifications and

• a translation for generating applications without the need for manual coding.

The following sections briefly give an outline how syntax and semantics can be defined by formal means. However, this covers only very basic parts. A more detailed description can be found in [HR00, EMS00, Pri01].

3.3 Formal Description Techniques

3.3.2 Formal Methods for Syntax Description

A formal grammar is a way to formally define a syntax. A grammar G is defined by the quadruple of a set of terminal symbols T, a set of non-terminal symbols N, a distinguished starting symbol s that is an element out of N and a set of grammar rules R (called production rules). A grammar rule consists of left-hand-side rules LHS and right-hand-side rules RHS. Both sides can contain a mixture of non-terminal as well as non-terminal symbols which are concatenated together. The grammar allows to recursively substitute a non-terminal symbol with each other with respect to the given rules. This process yields a set of possible expressions which define the syntax of the language specified by this particular grammar. For example, the following formal grammar defines an arithmetic binary expression consisting of multiple ‘0’ and ‘1’ with a ‘+’ and ‘-’ signs and open ‘(’ and closed ‘)’

brackets:

T={‘0’,‘1’,‘(’,‘)’,‘+’,‘-’},

N={expression, literal}, s={expression}

R={

<literal, ‘0’>, <literal, ‘1’>,

<literal, ‘0’ literal>, <literal, ‘1’ literal>,

<expression, literal>, <expression, literal ‘+’ expression>,

<expression, literal ‘-’ expression>;

<expression, ‘(’ expression ‘)’>

}

An easier way for the notation of such a grammar is e.g. the Backus-Naur-Form(alism) (BNF). The BNF defines the following notational template for a single grammar rule:

LHS ::= RHS

The pipe symbol ‘|’ given in the RHS specifies alternatives. In addition, there exists an extended variant of BNF, the Extended Backus-Naur Form (EBNF). This notation introduces some additional symbols for denoting recursions. For instance, zero to infinite recursions are denoted using the ‘*’

symbol; the ‘+’ symbol for one to infinite recursions. The square brackets ‘[’ and ‘]’ specify an optional rule expression. That is, the symbols inside the square brackets may be chosen to be left out or to be included into the production rule. Per convention, the start production rule in BNF expressions is the topmost one on the left. The grammar from above specified in EBNF will therefore look like:

expression ::= literal | literal ( ‘+’ | ‘-’ ) expression | ‘(’ expression ‘)’

literal ::= ( ‘0’ | ‘1’ )+

3.3.3 Formal Methods for Semantics Description

Semantics of a language describes the meaning of a syntactically correct statement in the context in which it is expressed. Formal semantics can provide a complete, rigorous definition of a language. The most important benefit is that this definition can act as the compulsory reference for system designers who want to understand intimate details of the language and also for implementers of the language itself by providing a standard against which the compliance of an implementation may be assessed [HB00].

Providing a robust definition can help to minimize the possibility that different implementers might interpret the definition of the language differently. If a language definition reveals ambiguities, one may face the possible consequence that a user’s program might run fine on one implementation, but

not on another language. A formal definition is typically much more concise than a corresponding description in English text. Thus, the main advantages of formal definitions – in contrast to their informal counterparts – are that they are inherently precise and unambiguous by using mathematical statements. Albeit, the downside is that the basics and benefits of formal semantics are not very widely known and understood yet. Nevertheless, a formal semantics can provide the mathematical foundation that enables to prove certain properties of programs. In order to formulate claims about programs a vital prerequisite is have a precise mathematical definition of the language.

There are two different kinds of statements possible one might wish to prove: statements about the language itself and statements about particular programs. An example of the first kind may be the claim that null pointers and type errors cannot arise at run time. Examples of the second kind include proofs of correctness of programs. Normally, the term correctness means that a program conforms to or behaves according to a given (formal) specification. For this purpose, formal semantics provide logical and conceptual tools needed for proving and defining precise statements of correctness.

However, being able to prove the correctness of programs by mathematical means has been discussed and researched for many years now, but it had still not become practical for sizeable programs.

The definition of semantics for specification languages has been researched over a long time. For mathematical languages well-known semantic definitions have been developed, but the definition of a semantics of a specification language is considerable more difficult. Specification languages have many special cases and a dynamic semantics which gives a new meaning to the statement based on the state the current interpretation has reached. There are several methods available for specifying semantics. Generally, they can be divided into the following categories:

• Informal semantics

• Translational semantics

• Operational semantics

• Denotational (or functional) semantics

• Axiomatic semantics

Informal semantics do not use formal methods for the semantics description, but in most cases (e.g. for C, Pascal, Ada, Java) these semantic definitions are written of carefully composed texts using natural language. Rapidly, this can become a very complex task because such standard documents have to describe not only every individual construct in the language, but also all possible combinations and interactions of these constructs have to be respected as well. Subtle issues can arise from several feature interactions and tend to augment rapidly. Hence, for very sizable and complex languages it is very difficult to take each possible construct and combination into account while keeping the description sufficiently precise and free from ambiguity. This makes it very hard to be certain that all possible interactions have been considered and thus, possibly resulting in a huge, long-winded and incomprehensible document. It is common that several additional examples are given which shall clarify the meaning and any possible ambiguities which may arise by reading the description. In spite of this effort, it is often prone to be interpreted in a wrong way as ambiguities may arise differently based on the respective reader.

Translational semantics is a special case of the definition of semantics. Translational semantics do not specify semantics itself, but translate a statement of a source language to a statement in another target language. The semantics for the target language statement can be mapped backwards to give a semantics to the source language statement. This is only possible and meaningful if the target language has a strong semantic basis. This may also introduce a steep learning curve as not only a single

3.3 Formal Description Techniques

the source and target language and as well as to the translational semantics description and the semantics description of the target language. However, for target languages which are considered well known and understood this can be a suitable way to bind a semantics description to a language. This may allow that this language can also be understood quite fast.

Operational semantics are very close to a concrete implementation. The fundamental idea is to define the execution by a virtual or abstract machine which interprets each instruction. An example for such a machine is the Turing Machine. By means of the machine’s description how it proceeds in execution, the behavior of the program – the semantics – is specified. The machine defines a transition function that specifies a subsequent state for each active state. Therefore, the semantics of the program is specified through the sequence of states traversed from the initial state to the final state.

Contrary to operational semantics, denotational semantics do not specify execution steps of an abstract machine. But it specifies a functional correspondence between the program variables which are modified through the execution of the program. This denotes how the values of the variables are influenced by the construction of the programming language. For this purpose, functions are defined based on the domain of the syntax and the co-domain of the semantics. State changes triggered by the program are described by these functions.

By axiomatic semantics, an axiomatic system is formalized to predicate state changes by means of logic equations and model theory. The entities of the language and their inter-relations are specified.

These three styles do not compete. They are mutually complementary and focus on different aspects of a language. Operational semantics are the most intuitive and useful standard for implementers.

Axiomatic semantics are most suitable for program verification. Denotational semantics strike a balance between operational and axiomatic semantics by providing the logical and conceptual link between both. It encourages the well-structured design of both programming languages and program logics. For instance, the semantics definition of the Specification and Description Language 2000 (SDL-2000) – introduced in the following section – is defined by means of a translation [GGP03] from an SDL program to an Abstract State Machine (ASM) program [Gur88]. The latter three semantic description styles are formal as they are founded on a strong mathematical basis. By the use of formal means, this allows to derive further answers from these semantics such as if there are language constructs that must not be used together or if there are language constructs which can be composed by others. Additionally, any behavior of the specification can be covered and analyzed by mathematical means. Some properties that are commonly examined by validation and testing are:

• Absence of Deadlock conditions

The system never enters a state that cannot be left due to a missing or occupied resource.

• Absence of Livelock conditions

The system never enters cycles that cannot be left due to a missing or occupied resource.

• Code Coverage

Each statement defined in the system can potentially be executed.

• Liveliness

Each state of the system can be reached from the initial state.

• Robustness

The system can react to unexpected, unusual or missing events.

• Termination

The final state – or an idle state for cyclic systems – can always be reached.

• Recovery from Failures

The system can recover to a normal state within a limited time after an error has occured.

It is well-known that these questions are non-decidable in general. That is, it is proof that a general algorithm to verify these properties cannot exist for all program-inputs it tries to analyze (The Halting problem, [HMU02]). However, these very useful properties come by means of a formal semantics.

The downside of formal semantics is the fact that it is hard to understand for the inexperienced programmer and tool developer, because it requires a good knowledge of the underlying mathematical means.

Im Dokument UML Profile for Communicating Systems (Seite 31-36)