Consequences of MOML for MOEA Design - A Multi-objective Genetic Algorithm for Peptide Optimiza

The results of the 3D- and 4D-MOML analysis provide some important hin-ts regarding the design of a MOEA. Both, the 3D- and 4D-MOML are very rugged and no specific structure is discernible according to the distribution of non-dominated solutions over the investigated parts of these landscapes of dif-ferent sizes. The 3D-MOML reveals a higher front diversity and therefore fewer solutions are in the optimal front compared to the 4D-MOML. These results

point out the known fact and challenge for domination-based MOEA that the number of non-dominated solutions increases exponentially with the problem dimension [144]. This observation is thus valid in the case of the proposed 3D-and 4D-MOP 3D-and the design of a MOEA has to take account of this fact.

Due to the higher number of non-dominated solutions and the lower front diversity in the case of the 4D-MOP, a far-reaching differentiation of the non-dominated solutions is required. The most intuitive way to perform this dif-ferentiation is by assistance of the selection procedure. A selection procedure only based on the non-dominated sorting of the solutions does not provide enough differentiation and as a consequence, a further selection criterion is challenging for this purpose.

Moreover, the 4D-MOP has a higher number of front-based plateaus compared to the 3D-MOP, but this average front-based plateau size of the first front is accordingly smaller. The number of plateaus identified by consecutive equal or nearly equal fitness values for each molecular function is lower in the case of the 4D-MOP compared to the 3D-MOP. The existence of this considerable number of front-based plateaus with approximately 10%first front plateaus in both MOMLs suggests the common approach to balance the search behavior of a MOEA towards exploration in early generations and exploitation in later generations. Thus, variation operators of the MOEA have to support a global search in the first generations of the MOEA to tap potentially high quality so-lutions, spread over the landscape. In the later generations, a more local search behavior of the MOML supports the search process in the neighborhood of the previously detected high quality solutions.

The increase of the time series length and therefore of the investigated MOML does not result in a proportional increase of non-dominated solutions neit-her in the case of the 3D-, nor in the case of the 4D-MOML. Moreover, the non-dominated solutions are unevenly distributed over the search space. The-se facts allow some considerations with regard to the population size: A large population size increases the probability to detect high quality solutions, espe-cially in a very rugged landscape. Therefore, the search performance benefits from a high population size in early generations, but a high population size is counterproductive in later generations, since the probability for the selec-tion of already detected high quality soluselec-tions into the succeeding generaselec-tion decreases with the population size.

Peptide Optimization

4.1 Exact Methods versus Metaheuristics

The justification of the design and the use of a metaheuristic to solve the pro-posed 3D- and 4D-MOP requires a calculation of the runtime complexity to solve these problems exactly. The general advantage of an exact solution me-thod is the calculation of the optimal solutions as opposed to approximative compromise solutions in the case of metaheuristics. The runtime complexity of the exact methods is reducible by the exclusion in advance of some of the feasi-ble solutions based on theoretical considerations. The search space complexity in the present 3D- and 4D-MOP is20²⁰ due to the short peptide sequences of the length 20, composed of the 20canonical amino acids.

Theoretical considerations of the feasible solutions usually allow the exclusion of different solution categories. Thea priori exclusion of peptides depends on the application field of peptide optimization. The proposed 3D- and 4D-MOP are selected with the aim of being as generic as possible regarding the deter-mination of their physiochemical properties. Therefore, any a priori exclusion is difficult without a concrete application area. Generally, in the field of drug design, peptides have to fulfill the essential properties of being synthesizable and soluble in aqueous solutions. A general guideline is given for the solubility of a peptide in aqueous solutions regarding its primary structure: hydrophobic peptides containing at least 50%hydrophobic residue (A, F, I, L, M, P, V, W, Y)¹ are potentially insoluble or only partly soluble. The number of peptides comprising20 canonical amino acids of which at least50% are hydrophobic is

∑

i=10

(20 i

)

·9ⁱ·11⁽²⁰⁻ⁱ⁾≈4.285x10²⁵, (4.1)

1http://www.anaspec.com/content/pdfs/PeptidesolubilityguidelinesFinal.pdf

where(₂₀

) is the number of possible orderings of the hydrophobic amino acids on i of 20 positions of the peptide, 9ⁱ is the number of possible orderings of the 9 hydrophobic amino acids on i positions and 11⁽²⁰⁻ⁱ⁾ are the num-ber of possible orderings of the remaining amino acids on the complemen-tary positions of the peptide. This reduces the search space only slightly to 20²⁰−4.285x10²⁵≈6.2x10²⁵. Such guidelines as those, for solubility in aqueous solutions, do not exist for the synthesizability by today. Therefore, an exclu-sion of potentially not synthesizable peptides is not possible. Instead of the exclusion based on theoretical considerations without empirical verification, it is more advisable to take the preferred properties as objective functions and therefore as a part of the molecular optimization problem.

An exact solution of the 3D- or 4D-MOP requires the evaluation of the ob-jective functions MW, NMW, hydro and InstInd for each peptide followed by fast non-dominated sorting. As the computational complexity of NMW is the highest of the objective functions (section 3.4), the complexity of the objective function evaluation is approximately O(N ·l²), where l is the peptide length and N the number of feasible peptides. The following fast non-dominated sor-ting has a computational complexity of O(k·N²), where k is the number of objective functions. Even for the 3D-MOP, the computational complexity is O(N·l²)+O(k·N²)≥6.2x10²⁵·20²+(6.2x10²⁵)²·3 = 1.15x10⁵². Assuming the use of world’s top soft computer Tianhe-2 developed by China’s National Uni-versity of Defense Technology², which performs30,86x10¹⁵ floating operations per second, this leads to a runtime of3.74x10³⁵sec. = 1.19x10ˆ ²⁸ years.

Im Dokument A Multi-objective Genetic Algorithm for Peptide Optimization (Seite 72-75)