• Keine Ergebnisse gefunden

2.4 Software

Table 2.4

Software Description Resource

CPLEX 10.1 commercial mathematical solver for linear programming, mixed integer programming, quadratic programming and quadratically constrained programming problems

IBM ILOG Incorporation

FASIMU Flux Balance Analysis simula-tion software

Hoope A., 2011 [16]

Metannogen Software to manage informa-tion for metabolic networks re-construction

Gille C., 2007 [93]

PSQL object-relational database system

The PostgreSQL Global Development Group R Project software environment for R Foundation for

statistical computing Statistical Computing

3 Methods

This chapter presents methods and general strategies for metabolic network reconstructions and summarises computational methods for functional testing and validation of such models.

The methodology of network reconstructions can be divided into two different types: a top-down and a bottom-up approach. In a top-down approach, high-throughput molecular biology data such as gene expression information and proteomic information is used to identify network components. This process is supported by statistical and computational methods. It is an advantage of the top-down approach that no prior knowledge of a biological system is required to perform a reconstruction. Thus, unbiased analysis of biological systems is possible and discovery of new biological features is more likely. In contrast, a bottom-up approach place emphasis on manual curation of biological evidence obtained from scientific literature and genome annotation. This approach is mainly hypothesis-driven and focuses on specific aspects of molecular cell biology such as metabolism, transcription regulation or signalling.

Depending on the scope of the study, experimental evidence, such as genome or gene expression information, has to be evaluated regarding which reactions are to be included. This is of great importance for network reconstructions taking into account the maturation state of a cellular system, because a selection of potential reactions has to be done which reflect different biological features as found between f.exp., myeloid progenitor cells compared to monocytes or fetal cardiomyocytes compared to adult cardiomyocytes.

Irrespective of the approach which is eventually adopted, the integration of data from multiple resources and different levels of evidence is required. This includes high-throughput genome data, multiple types of experimental data and bibliomic information. There are a number of public databases available providing genomic and molecular information about different organisms and tissues, including human cells. Among these are resources providing information about metabolic reactions and compounds such as the Kyoto Encyclopedia of Genes and Genomes (KEGG)

[18], Reactome [89] and HumanCyc [94]. While other databases focus on gene and protein alignment such as PRIDE [95], Uniprot Knowledgebase (UniprotKB, [24]) or Ensembl [25]. Together these databases offer researchers a good basis to perform reconstructions of cell and tissue type specific human networks.

The following sections summarise a general protocol and tools for the reconstruc-tion of cellular networks and present computareconstruc-tional methods for testing models for their functionality and consistency.

3.1 Methods for a metabolic network reconstruction

In the recent past, different protocols have been introduced for the reconstruction of cellular networks [11, 15, 96]. All these protocols share a basic concept: an initial set of genes is obtained from genome annotation or high-throughput data and assigned to enzymes which carry enzymatic reactions. Each identified enzyme is than evaluated regarding its biochemical characteristics such as subcellular localisation, required cofactors and metabolic activity. For this purpose specialised databases such as the Braunschweig Enzyme Database (BRENDA)[17] and KEGG [18] are consulted or evidence from experimental data and scientific literature is collected. The generated reaction scheme is than further evaluated regarding consistency and admissibility of its contents. In summary the general strategy for a comprehensive model reconstruction requires the following steps:

1. Definition of a preliminary reaction list

2. Extension of composed reaction list by missing reactions and metabolites 3. Generating the mathematical representation of the network

4. Evaluation, validation and consistency check of the reconstructed network These steps form the backbone of each reconstruction and should be regarded as guidance, whether a top-down or a bottom-up approach was chosen.

However, to reconstruct a functional and valid model of a cellular system is it essential to iterate each of the given steps multiple times to include all wanted biological processes. In case missing reactions are identified then the whole process starts again from the beginning including constant re-evaluation.

3.1 Methods for a metabolic network reconstruction

3.1.1 Reaction list

Irrespective of the chosen approach, either top-down or bottom-up, the initial step aims to generate a preliminary reaction list. For this purpose it is necessary to collect genomic, bibliomic or high-throughput data which are specific for the respective organism or cell type. This list will form a preliminary network functioning as a starting point for further data integration. Previous investigations mostly started with a fully annotated genome [10, 61] which can be automatically generated by using databases providing genomic sequence information such as UniGene [23], NCBI Entrez Gene [87] or Ensembl [25].

The obtained gene information is subsequently linked to the encoded enzymatic function. Finally this allows to automatically or to manually determine which reactions are carried out by these enzymes and by which stoichiometries. In this context, databases such as BRENDA [17], KEGG [18], MetaCyc [90] or TransportDB [92] provide metabolic information to generate these network reactions.

3.1.2 Extension of composed reaction list

The composed reaction list requires further manual curation and careful revision to create a reliable network for mathematical simulations of cellular or tissue-specific behaviour. In this step the focus is on the verification and reconciliation of collected information about metabolites, enzymes and reactions from literature or experimental evidence. The collected meta-information can be stored in a network database to support the curation process and further revision.

Additional information about every incorporated metabolite should be obtained including charge, stoichiometric formula and identifier. The charge balance of reactions has to be ensured and checked. The subcellular localisation of each reaction should be determined as well as the reaction directionality.

This process can be supported by external database and methods to determine

G values in order to set the directionality, such as the group contribution method [97, 98].

3.1.3 Generating the mathematical representation of the network

Following the manual curation and revision, the network has to be converted into a format enabling mathematical computations. The file format syntax is dependent on the respective mathematical solver and characteristic of the model (e.g. non-linear, linear). The most common file formats used by solvers such as CPLEX [99]

and LINDO API [100] are the Mathematical Programming System (MPS) format, the Linear Programming (LP) format and the Math Program Instructions (MPI) for-mat. It is possible to use programs written in a standard programming language such as Perl or Python to convert the drafted network into the required file format.

In the recent past tools have been developed to provide platforms for direct use of a reconstructed network to solve optimisation problems with different FBA algorithms. For example, the openCOBRA project [101] and the CellNetAnalyzer [102] provide a toolbox for use in Matlab, while FASIMU [16] is a command line oriented software. These software packages enable the incorporation of a recon-structed network in a plain reaction scheme, Extensible Markup Language (XML) format or Systems Biology Markup Language (SBML) format. In the past, SBML developed into a standard representation format for communicating and storing mathematical models of biological systems. Together with software packages such as Metannogen [93] or the SQL database system, it is possible to generate biological networks and directly link the respective knowledge-base with mathematical computations through the SBML file format.

3.1.4 Evaluation, validation and consistency check of the reconstructed network

It is important to evaluate the network capability to fulfil tissue or organism specific biological functions in order to seek functionality and consistency of the reconstruction. Each function is incorporated as an objective function into the optimisation problem and flux distributions are predicted for the respective cellular state. These objectives might represent certain important cellular processes such as ATP formation, detoxification or protein synthesis. However, the set of biological functions is based on the scope of the respective study, thus it might be limited and has to be carefully defined. Therefore, it is necessary to ensure the production of included metabolites and further evaluate

3.2 Methods for analysing network states and estimation of stationary fluxes network reactions for (1) dead-ends, (2) incorrect directionality or (3) isolated reactions. The constraints and system boundaries for the network have to be defined and evaluated for feasible solutions while applying a set of biological functions.

3.2 Methods for analysing network states and estimation of stationary fluxes

This section presents computational methods which are widely used to determine flux distributions in metabolic networks and analyse cellular states.

3.2.1 Flux Balance Analysis

Flux Balance Analysis (FBA) is a linear programming (LP) based method for flux prediction and analysis of biological system. In contrast to kinetic modelling, no enzymatic kinetic data is included to calculate flux distributions. A linear program-ming problem is defined to find optimal solutions for corresponding network states.

In FBA it is proposed that all internal fluxes fulfil the steady-state condition with respect to all metabolites and applied constraints. The linear optimisation problem is then solved to find optimal solutions while maximising or minimizing an objective function f(v). The general LP in FBA reads as follows:

maximize/ minimize f(v)=

r

X

i=1

ci·vi (3.1)

subject to N·v = 0, (3.2)

vmin,ivi vmax,i (3.3) where v Rn is the flux vector and N is the m × n stoichiometric matrix of the network, with m metabolites and n reactions. The objective function is represented by a linear combination of metabolic fluxes vi with the coefficient ci represent-ing weights. The lower and upper bounds on each reaction are represented by vmin,i and vmax,i, respectively. Objective functions match cellular functions such as ATP formation, maximisation of biomass or minimisation of external substrate up-take. However, each function depends on the scope of the study, thus might be incomplete. All possible flux distributions for a given network in a steady-state

Figure 3.1: A. Example for a reaction network with m metabolites (j=1,2,...,m) and n reactions (i=1,2,...,n). Each metabolite S is assigned to a metabolic reaction with a specific flux rate v. Dashed lines indicate system boundaries.

B.Stoichiometric matrix with stoichiometric coefficients of the given metabolic network.

C.The steady-state flux cone.

condition define a polyhedral cone or thesteady-state flux cone (see Figure 3.1).

3.2.2 Flux Variability Analysis

In some cases, linear problems have more than one optimal solution [103, 104].

Most solvers, e.g. CPLEX, terminate as soon as an optimal solution to an optimisation problem is found. Especially, CPLEX does not automatically pro-vide methods to find alternate optimal solutions. However, a recently developed method, the Flux Variability Analysis (FVA), aims to estimate alternative optima for different network states [105, 106]. Using FVA it is possible to determine the robustness of a metabolic network and possibly identify network redundancy.

The existence of such alternative reaction sets could compromise predictions for network states regarding:

3.2 Methods for analysing network states and estimation of stationary fluxes

1. optimal flux distributions for different biological states,

2. estimated substrate requirement or biosynthesis of different metabolites and 3. process optimisation.

In the FVA approach as proposed by Mahadevan et al [105] the optimality condition is relaxed for calculating the maximal and minimal values of all fluxes.

The objective function f(v) is to be constrained as above 95% of the optimal achievable growth rate zobj. The linear optimisation problem reads as follows:

maximize/ minimize vi (3.4)

subject to N·v = 0, (3.5)

f(v)0.95·zobj (3.6)

0vi vmax,i (3.7)

where v (i=1,2, ... n) denotes for all network reactions and N describes the stoichiometric matrix.

3.2.3 Flux Coupling Analysis

The Flux Coupling Analysis (FCA) aims to identify (1) coupled and (2) blocked reactions in metabolic networks [107] while assuming steady-state condition. Here, a linear fractional programming is employed to compare calculated flux ratios for every pair of metabolic fluxes within a network.

Blocked reactions are defined as fluxes whose maximum and minimum values equal zero, thus blocked reactions are incapable of carrying any flux in the given scenario. The linear optimisation problem reads as follows:

maximize vi (3.8)

subject to N·v = 0, (3.9)

vuptake,i vuptake-max,ifor all transport reactions (3.10)

vi0 (3.11)

where N denotes for the stoichiometric matrix and v for the flux through reaction i. In this approach reversible reactions are expressed as two separate irreversible

reactions. In the FCA method all reversible reactions are split into a forward and a backward reaction, which are constrained to carry a non-negative flux.

It is further possible to differentiate three types of coupled reactions with this method: reactions are (i) directional, (ii) partial or (iii) fully coupled. Here, fully coupled reactions are fixed fluxes where a flux v1 has the samel value as v2 and vice versa. The identification of coupled reactions can either occur by a (1) non-linear optimisation problem or through variable transformation by a (2) non-linear op-timisation problem. In the nonlinear opop-timisation problem, upper and lower limits of flux ratios for every flux pair in the network are calculated. However, this non-linear problem can be transformed into a non-linear problem by setting a constrained reference flux ˆv2 to 1 and normalizing flux v1 to v where ˆv = v · t. By applying this variable transformation a linear problem is obtained which reads as follows:

maximize/ minimize vˆ1 (3.12)

subject to N·v = 0, (3.13)

vˆ20, (3.14)

vˆuptake,i vuptake-max,i·t for all transport reactions, (3.15)

vˆi 0, (3.16)

t 0 (3.17)

By comparing the calculated flux ratios it is possible to decide how reactions are coupled. Consequently, the amount of linear optimisation problems to be solved increases with this method, which requires a large computing capacity in dependence on the network. Since in large-scale networks the identification of blocked reactions can support the reconstruction process, another method has been developed to aid this [108]. Network pruning aims to create a smaller sub-network which contains no dead ends and blocked reactions, thereby generating a network in which all reactions are coupled. The resulting sub-network can then be analysed regarding consistency and functionality to model the respective cellu-lar system. However, the set of blocked reactions should be evaluated regarding (1) missing links to other network reactions, (2) missing transport reactions or (3) missing metabolites. The revision process can be supported by bibliomic data and may lead to reintegration of reactions into the network. Both approaches, FCA and network pruning, are functions included in the FASIMU software package.