Convex Optimization - Convex Regularization

2.2 Convex Regularization

2.2.2 Convex Optimization

2 Regularization for Image Deblurring and Denoising

Figure 2.5: a|b|c|d|e. Convex sets and convex functions. (a)(b) Convex sets. (c) Nonconvex set. (d) Convex function. (e) Strictly convex function.

Integration and combination of information, statistical learning and variational regularization have been also investigated by researchers and still be an interesting research point.

Figure 2.6: a|b|c|d|e. Function curves. (a). Tikhonov. (b) Total variation. (c) Huber function. (d) Log-quadratic. (e) Saturated-quadratic

Table 2.1: Convex and Nonconvex Functions

Function Formula Convexity

Quadratic function: φ₁(x_i, x_j) = (x_i, x_j)² convex.

Total variation: φ2(xi, xj) =|(x_i, xj)| convex.

Huber function: φ₃(x_i, x_j) = ₁

2δ(xi−xj)², if|x_i−xj| ≤δ

|x_i−xj| −^δ₂, otherwise mixed.

Log-quadratic: φ4(xi, xj) = ln[1 +^(xⁱ^−x_δ2^j⁾²] nonconvex.

Saturated-quadratic: φ₅(x_i, x_j) = _δ2^(x+(xⁱ^−xi−x^j⁾²j)² nonconvex.

The main result that can be used to address this issue is the theorem of Weierstrass, which states that iff is continuous and Ω is compact, a solution exists. This is a valuable result that should be kept in mind throughout our development. In the practical reality, searching for the minimum point by a convergent stepwise procedure based on differential calculus, comparison of the values of nearby points is all that is possible and attention using relative minimum points. Global conditions and global solution can only be found if the problem possesses certain convexity properties that essentially guarantee that any relative minimum is a global minimum. Thus, in formulating and attacking the problem arg minf(x), subject tox∈Ω is usually considered as a searching for the relative minimum point.

Convex Sets and Convex Functions

A convex set is the set of basic solutions for convex programming [206], shown in Fig. 2.5.

It means that x₁ and x₂ are feasible solutions, their linear combinations is λx₁+ (1−λ)x₂,

∀λ ∈ [0,1], must be feasible solutions. For convex programming to be applicable the cost must be a strictly convex functional over the convex set of feasible solutions. A functional F : X → [−∞,∞] is strictly convex if, for any two feasible solutions x₁ and x₂ such that F(x1)<∞, and F(x1)<∞, the inequality

F((1−λ)x1+λx2)<(1−λ)F(x1) +λF(x2), ∀λ∈(0,1) (2.11) always holds. This definition of a convex functional requires that Eq. (2.11) be valid over the set of feasible solutions. The result is also known as Jensen’s inequality and it can be applied to information theory and machine learning. The generic convex optimization problem is to minimize the convex function F(x) over a convex set. Convexity is a sufficient condition for all local minima to be global minima. There are three main properties about the convex

2 Regularization for Image Deblurring and Denoising

optimization, e.g., a convex function is continuous, a convex function has a single minimum on a convex domain, and the sum of convex functions is convex.

Following this definition, we study several functions that are commonly used as penalty terms in regularization approaches for image restoration [36], [22], [219], [267], [176]. The potential functions, φ(·) are described in Table. 2.1. These five functions are representatives of three major categories, strictly convexφ1 and φ2, hybrid convexφ3 and nonconvexφ4 and φ5, shown in Fig. 2.6. The convex quadratic functionφ1 in the regularization penalizes the differences of neighboring pixels at an increasing rate, which tends to force the image to be smooth everywhere.

The total variation function φ2 [213] is a L¹-norm cost function, which behaves in an absolute error in convex manner. Many convex functions have also been proposed recently such as robust anisotropic diffusion [30], [29], half-quadratic [84], linear programming [47], second-order cone programming [110] and low-dimensional flat Euclidean embedding in semi-definite programming [268].

The Huber function [115] is a semi-convex hybrid between quadratic and L¹ functions. It is quadratic for small values and becomes linear for larger values. Thus, it has the outlier stability of L¹. Therefore, the priors do not differentiate substantially between slow monotonic changes and abrupt changes. As a consequence, it does not penalize the presence of edges or boundaries in the image.

Non-convex functions have saturating properties that actually decrease the rate of penalty ap-plied to intensity differences beyond a threshold. Consequently, the positivity of the presence of edges can be preserved in the image restoration. However, the non-convex functions present difficulties in computing global estimates. Non-convex optimization algorithms can also achieve good results with some constraints or in some special processing discipline. Recently, some non-convex optimization algorithms have also been developed. For example, binary spectral graph clustering and semidefinite relaxation [130] have been investigated for perceptual grouping and segmentation.

Multiple Model Criteria for Global Convergence

One of the key questions of image restoration and segmentation is how to optimize the proposed cost or energy function in global convergence. According to “http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/index.html”, we describe an optimization tree shown in Fig. 2.7. This tree introduces the different subfields of optimization and includes outlines of the major algorithms in each area. Through the literature study, several model criteria of optimization can be summarized in the following aspects:

1. Direct local minimization to global convergence. A local minimum energy can be substituted for the global minimum J using a plausible initial guess. Such algorithms are simple to implement but are relatively sensitive to the pertinence of the initialization. For example, the original ICM algorithm [26] uses the maximum likelihood algorithm based on the well-posed assumption where the noise should be very weak.

2. Stochastic simulated annealing to global convergence. Optimization using simu-lated annealing (SA) is based on the distribution p_t(x|y) = exp[−J(x)/t], wheretdenotes temperature. t→0 decreases toward zero for objectsxdifferent from the global minima ˆx.

pt(x|y) is processed to construct a Markov chain which converges to the set of the global

Integer Programming Stochastic Prgoramming Network

Programming

Linear Programming Nonlinear

Equations Nonlinear Least

Squares Global Optimization

Nondifferentiable Optimization

Bound Constrained Nonlinearly

Constrained

Optimization

Constrained

Continuous

Unconstrained

Discrete

Figure 2.7: Optimization tree. Three main optimization criteria can be considered in this optimization tree, e.g., continuous versus discrete, global versus local, and convex versus non-convex.

minima of J. In this process, the temperature decreases slowly from an initial high tem-perature toward zero. The Markov chain can be constructed based on stochastic gradient maximization of p_t(x|y) [82], [273], metropolis dynamical sampling of p_t(x|y) [83], [273], Gibbs dynamical sampling of pt(x|y) [111], [85], [273]. This type of algorithms is widely used in image and signal processing.

3. Deterministic relaxation to global convergence. A class of approximate (relaxed) energies is constructed by reducing the nonconvexity of J. Thus, the nonconvexity is

“converted” into convexity and it reaches a relaxed energy to achieve a global minima, e.g., mean field annealing (MFA) [81], [229].

4. GNC relaxation to global convergence. The graduated non-convexity (GNC) algo-rithm proposed by Blake and Zisserman [33] constructs an approximating convex function free of spurious local minima, while stochastic methods avoid local minima by using ran-dom motions to jump out of them. The underlying principle of the general GNC alogrithm is also “convert” the nonconvexity to convexity because the algorithm approximates the global minima by minima of suitable approximating functions. Blake and Zisserman [31]

has made a detailed comparison of the efficiency of deterministic and stochastic algorithms for visual reconstruction. Piecewise continuous reconstruction of real-valued data can be formulated in optimization problems. They also point out that the deterministic algo-rithm (GNC) outstrips stochastic (simulated annealing) algoalgo-rithms both in computation efficiency and in problem-solving power.

5. The graph cuts algorithm to global convergence. The graph cuts algorithm was first used in combinational optimization by Greig et al. [98] and recently was intensively studied for computer vision tasks [132]. The algorithm is based on linear programming and is used for binary optimization with the min-cut/max-flow algorithm. Each variable in this algorithm has one of two possible values. The cost function in the graph cuts algorithm need notto be convex but the cost function must be regular.

2 Regularization for Image Deblurring and Denoising

We can observe that most of these modeling criteria are based on the convexity for achiev-ing global convergence. Followachiev-ing Fig. 2.7, global convergence via stochastic optimization can be computed in discrete spaces (using discrete simulated annealing, mean field theory, multi-scale optimization) [127]. Global convergence via deterministic regularization approaches can be computed in continuous or discrete spaces (using continuous simulated annealing, conjugate gradient, gradient descent, Gauss-Seidel algorithm) [4], [234]. Therefore, there is an underlying relationship between discrete optimization and continuous optimization. Stochastic program-ming is the bridge between the two fields. Moreover, stochastic optimization and deterministic optimization approaches might be unified given certain conditions.

Im Dokument Bayesian Learning and Regularization for Unsupervised Image Restoration and Segmentation (Seite 36-40)