Problem Formulation - Context-Aware Decision Making in Wireless Networks: Optimization and Mach

5.4.1 Formal Problem Statement

In this section, based on the models presented in Section 5.3, we formulate the prob-lem of context-aware worker selection for maximizing the worker performance in MCS applications with non-spatial tasks to be solved in hierarchical fashion by the MCSP and the LCs. As stated above, we assume that tasks t = 1, ..., T arrive sequentially.

Consider now an arbitrary sequence of T task and worker arrivals, i.e., consider a sequence of tasks t = 1, ..., T with arbitrary task budgets {b_t}_t=1,...,T, arbitrary task contexts {c_t}_t=1,...,T, arbitrary worker availability {W_t}_t=1,...,T and arbitrary worker contexts{x_t,i}i∈Wt,t=1,...,T. The goal of the system of MCSP and LCs is to select work-ers for each task in such a way that the expected cumulative worker performance up

to task T is maximized. Based on the action model in Section 5.3.6, the problem of selecting, for each task, a subset of workers which maximizes the sum of expected performances given the task budget is given by

max

t=1

i∈Wt

θ_i(x_t,i,c_t)y_t,i (5.6)

s.t. X

i∈Wt

y_t,i ≤m_t ∀t= 1, ..., T

y_t,i ∈ {0,1} ∀i∈ W_t, ∀t= 1, ..., T.

with y_t,i of (5.3), θ_i(x_t,i,c_t) as defined in Section 5.3.4 and the constraints from (5.4).

Problem (5.6) includes the expected context-specific worker performances θ_i(x_t,i,c_t), i.e., the expected performances of available workers as functions of their joint personal and task contexts.

5.4.2 Oracle Solution

First, we analyze Problem (5.6) under the assumption that there would be an entity that had a priori knowledge about context-specific worker performances and access to the current personal worker contexts. Hence, only in this section, suppose that there exists an entity which (i) is an omniscient oracle, knowing the expected performance of each worker under each joint personal and task contexta priori, and that this entity (ii) is centrally informed about the current personal contexts of all available workers for each arriving task.

For such an entity, Problem (5.6) corresponds to an ILP problem, cf. Section 2.3.2.2. As the sub-problems in Problem (5.6) for the different tasks are not coupled, Problem (5.6) can be decoupled into T independent sub-problems, one for each arriving task. For a task t, if fewer workers are available than required, i.e., W_t ≤ m_t, the trivial optimal solution of the sub-problem associated to task t is to request all available workers to complete the task. In contrast, if for a task t, W_t > m_t holds, the sub-problem associated to task t corresponds to a knapsack problem, cf. Section 2.3.2.3, with a knapsack of capacity m_t and with W_t = |W_t| items, where item i ∈ W_t has a unit weight and a non-negative profitθ_i(x_t,i,c_t). Due to the unit weights, the sub-problem in this case actually is a special case of the knapsack problem that may be solved efficiently. Indeed, the optimal solution of the sub-problem can be easily computed in a running time of at most O(Wlog(W)) as follows. The optimal solution is given by ranking the available workers in W_t according to their context-specific expected performances and by selecting them_t highest ranked workers.

For a task t∈ {1, ..., T}, we denote an optimal subset of workers to select for the task byS_t^∗ :={s^∗_t,1, ..., s^∗_t,min{m

t,Wt}}. Formally, these workers satisfy s^∗_t,j ∈ argmax

i∈Wt\Sj−1 k=1{s^∗_t,k}

θ_i(x_t,i,c_t) for j = 1, ...,min{m_t, W_t}, (5.7)

whereS0

k=1{s^∗_t,k}:=∅. Note that several workers may have the same expected perfor-mance and hence the optimal set of workers may not be unique, which is also captured here. Moreover, note that an optimal set S_t^∗ of workers for task t depends on the task budget b_t, task context c_t, price e_t, the set W_t of available workers and their personal contexts {x_t,i}i∈W_t, but we write S_t^∗ instead of S_t^∗(b_t,c_t, e_t,W_t,{x_t,i}i∈W_t) for brevity.

Let

S^∗ :={S_t^∗}t=1,...,T (5.8)

be the collection of optimal subsets of workers for the collection {1, ..., T} of tasks.

We call this collection thecentralized oracle solution, since it requires an entity with a priori knowledge about expected context-specific worker performances and with access to personal worker contexts to make optimal decisions.

5.4.3 Contextual Multi-Armed Bandit Formulation

Now, we characterize Problem (5.6) under the conditions actually faced by the MCSP and LCs. Namely, the set of MCSP and LC do not have a priori knowledge about expected performances, and the workers’ personal contexts are only locally available in each mobile device, but may not be shared with the MCSP.

If for an arriving task t, fewer workers are available than required, i.e., Wt ≤ mt, by simply requesting all available workers (i.e.,S_t=W_t) to complete the task, the MCSP automatically selects the optimal subset of workers. Otherwise, ifW_t> m_tholds for an arriving taskt, the MCSP cannot simply solve the sub-problem for taskt appearing in Problem (5.6) like the centralized oracle. This is because on the one hand, it does not know the expected performances θi(xt,i,ct) and on the other hand, the MCSP cannot access the workers’ personal contexts. Hence, in this case, a machine-learning-based approach, cf. Section 2.3.1, is needed since the system of MCSP and LCs can only learn the workers’ performances by selecting different workers over time and observing their instantaneous performances.

Considering the problem statement in Section 5.4.1, under the conditions actually faced by the MCSP and LCs, Problem (5.6) can be understood as a contextual MAB problem

as follows, cf. Section 2.3.3.4. The MCSP and LC can be understood as a set of agents, where one of them (the MCSP) needs to sequentially select from a set of actions. In the considered MCS problem, the set of actions is given by the set W of workers. There is a sequence of tasks t = 1, ..., T that corresponds to a sequence of rounds faced by the agents. At the arrival of a task t, only a subset W_t ⊆ W of workers may be available and hence the set of actions may be different in each round. For each arriving task, the following events happen sequentially. First, the MCSP receives the task and especially observes the task contextc_t. Moreover, the LC of each available workeri∈ W_tobserves its worker’s personal context x_t,i. This corresponds to several contexts revealed to the agents in the beginning of a round. Secondly, the MCSP selects a subset of min{m_t, W_t} workers from set W_t and requests them to complete the task. This corresponds to an agent selecting a subset of available actions. Thirdly, each LC of a requested worker observes the instantaneous performance of the worker. This corresponds to the agents receiving a reward for each selected action. Taking into account the assumptions about the arrival processes of the tasks, workers and their performances in Sections 5.3.2 – 5.3.4, Problem (5.6) corresponds to a contextual MAB problem with a similar model as the one presented in Section 2.3.3.4. The main differences between these two models are as follows:

• In Problem (5.6), the agent may select several actions per round instead of only one and the number of actions to be selected may be different in each round.

Therefore, formally, Problem (5.6) is a contextualcombinatorial MAB problem, cf. Section 2.3.3.2. However, since neither the objective function nor the con-straints in Problem (5.6) are combinatorial, Problem (4.4) is more accurately a contextual MAB problem with several action selections per round, but not of combinatorial nature.

• In Problem (5.6), actions may be unavailable in arbitrary rounds, whereas in the model in Section 2.3.3.4, actions are always available. Therefore, Problem (5.6) is a contextual MAB problem with sleeping arms, cf. Section 2.3.3.2.

• Instead of one agent as in the model in Section 2.3.3.4, Problem (5.6) has to be solved cooperatively by several agents, where one coordinating agent (i.e., the MCSP) selects a subset of actions in each round based on the estimates of a set oflearning agents (i.e., the LCs), where each learning agent observes the context of one particular action and learns the rewards of this action.

Consequently, a coordination mechanism between the MCSP and LCs needs to be de-signed in order to enable the LCs to learn their workers’ context-specific performances

over time and to enable the MCSP to select suitable workers for each task to maximize the worker performance on this task given its task budget. Specifically, over time, the system of MCSP and LCs has to use a suitable trade-off between exploration and ex-ploitation, by, on the one hand, selecting workers about whose performance only little information is available and, on the other hand, selecting workers who are likely to have high performance. For each arriving task, the selection of workers depends on the history of previously selected workers and the corresponding observed performances.

Since observing worker performance requires quality assessments that may be costly, the number of performance observations should be limited in order to keep the cost for quality assessment low. An algorithm which maps the history of previously selected workers and observed performances to the next selections of workers is called alearning algorithm. The performance of such a learning algorithm can be evaluated by com-paring its loss with respect to the centralized oracle solution given in (5.8) in terms of the achieved cumulative worker performance. Formally, for an arbitrary sequence ofT task and worker arrivals, the regret of learning with respect to the centralized oracle solution is given by

R(T) =E





t=1

min{m_t,Wt}

j=1

p_s^∗

t,j(x_t,s^∗

t,j,c_t, t)−p_s_t,j(x_t,s_t,j,c_t, t)



. (5.9)

where pst,j(xt,st,j,ct, t) denotes the instantaneous performance of the selected worker s_t,j ∈ S_t, j ∈ {1, ...,min{m_t, W_t}}, with personal worker context vector x_t,i for task t with task context c_t. Here, the expectation is taken with respect to the se-lections{S_t}_t=1,...,T made by the learning algorithm and the randomness of the workers’

performances.

Equivalently, one can write the regret R(T) as R(T) =

t=1

min{mt,Wt}

j=1

θ_s^∗

t,j(x_t,s^∗

t,j,c_t)−E[θ_s_t,j(x_t,s_t,j,c_t)]

. (5.10)

Im Dokument Context-Aware Decision Making in Wireless Networks: Optimization and Machine Learning Approaches (Seite 143-147)