Study Design - Requirements Intelligence : On the Analysis of User Feedback

We discuss the research questions, our study design, and the data our analysis relies on.

7.2.1 Research Questions

In this work, we answer the following research questions.

RQ7.1 To what extent can we automatically classify private and professional mobile device usage? Several classification algorithms

sup-port supervised machine learning, but they may have different results in terms of accuracy (precision, recall, and F1-score). Therefore, we com-pare conventional classifiers to search for the best result by performing a benchmark. We answer this research question in Section 7.3.1.

RQ7.2 What is the minimal feature set for an accurate and re-liable classification? We study the accuracy of the classification under the consideration of data minimization. We use as few features (i.e., data collected) as possible for two reasons: (1) to reduce the privacy concerns of the approach and (2) to increase the computing performance and lower battery consumption. To answer this research question, we create a second benchmark that compares the accuracy of four classifiers with a minimized feature set. We explain our results in Section 7.3.2.

RQ7.3 To what extent can we use a trained classification model across multiple users? One challenge is to create a classification model that we can use without new users having to label data. In this study, we analyze the accuracy and the features to generalize our approach by creating a single classification model made of the data from different persons and test it with data that is unseen to the model. We compare the performance of the full and the minimized feature sets of RQ7.1 and RQ7.2. We answer RQ7.3 in Section 7.4.

7.2.2 Study Process

Figure 7.1 depicts our overall research process. The figure includes two thematic columns that correspond to our main phases: Crowdsourcing Study and Machine Learning Experiments.

The crowdsourcing study had the goal of collecting labeled data for the machine learning approach (first column). We first developed an Android app for the labeled data collection. Then, we ran a pre-study to check that the app can collect the data we need and to confirm the feasibility of the approach. Finally, we conducted a two-week crowdsourcing study with 18 participants to get labeled data. The result of the crowdsourcing study is the labeled data (our raw study data).

Pre-study dry run

Study data Development of data

collection app

3. Benchmark Machine Learning Experiments

Qualitative analysis Feature

minimization

Within-apps analysis results

2. Benchmark Crowdsourcing Study

Between-apps analysis results 1. Benchmark

Crowdsourcing study

Figure 7.1: Steps of the study.

The machine learning experiments are the core of classifying the labeled data into private and professional device usage. The automatic classification considers two perspectives, the within-users analysis, which is about one user collecting context-data, and a personalized classification for that user. We created the first benchmark using all collected context types. Then we minimized the number of context types and created the second benchmark. The output of these steps is the within-apps analysis results in Section 7.3. The between-users analysis builds a classification model from a subset of people and applies that model to others, unseen people (test set). We create the third benchmarking reporting the between-users classification results and then qualitatively discuss why some cases performed better than others.

The remaining part of this section focuses on the crowdsourcing study. We describe the crowdsourcing study, the participants, and the collected implicit user feedback (the labeled dataset).

7.2.3 Study Data

Here we describe how we collected our study data for the machine learning ex-periments.

Data Collection App

For the machine learning experiments, we had to collect real-world (not synthetic) labeled implicit user feedback. For the data collection, we developed an Android app that is capable of collecting the context types that we introduced in Table 2.3 of Chapter 2. We decided to target Android because of its high market share [113] and its powerful APIs, which are less restrictive than those of iOS.

As a reminder, we describe the context and interaction data we collected.

The App Sensor[software|pull] identifies the current foreground app that the user sees. The Interaction Sensor[software|push] monitors touch events like clicking, scrolling, and writing, but also reads the text of the interacted element.

TheLocation Sensor[hardware|push] retrieves the longitude and latitude of the user’s location. The Connectivity Sensor[software|push/pull] is a sensor that bundles several sub sensors collecting, e.g., the BSSID or the encryption of the current network. TheSystem Sensor[software|pull] gathers general information such as the time, date, and running background services.

Crowdsourcing Study

First, we conducted a dry run with four participants to check the feasibility of our approach and to confirm that the app is collecting all the necessary data.

The result was a refined version of the data collection app, which we used for the crowdsourcing study.

Afterwards, we run the crowdsourcing study. In the crowdsourcing study, we collected context data to analyze and understand if we can automatically detect private and professional device usage. For this purpose, we developed an Android app that uses the sensors previously described.

After the installation of the app, a dialogue opens that asks the users about their consent, as well as whether he/she is currently at home orwork, as shown in Figure 7.2 a). Afterwards, the app runs in the background and collects context data in real-time. Whenever the screen of the device is unlocked, the app asks the user about the usage purpose, as shown in Figure 7.2 b). The dialogue has three options: Privately, For work, and Ask me later. If the purpose of the usage is unclear, users could chooseAsk me later, which re-opens the dialog when the screen turns off. In some cases, a user might also want to switch between private and professional usage during a session. For instance, a user chats with

b) Dialog for supervised learning

c) Notification to update the usage type a) Initial installation dialog

Figure 7.2: Screenshots of the study app.

her partner and receives an important email from the office and starts reading it. In such a case, the participants were able to adjust the current usage in the notification bar, as shown in Figure 7.2 c). A session is the time between unlocking the device and locking it again.

Before we started the crowdsourcing study, we distributed a manual to the study participants that explained the process of the crowdsourcing study. Af-ter these preparation steps, we started the two-week crowdsourcing study with 18 participants from six different countries, of whom six were female, and 12 were male (demographics in Table 7.1). Nine participants were from academia, employed by universities and research institutes, while the remaining nine ticipants were from private companies of different sizes. From the 18 initial par-ticipants, six either participated for less than a week or did not enable all sensors as described in the manual; e.g., the accessibility sensor needs manual approval from the user. As a consequence, we removed these participants from the data analysis. Since we anonymized the data and did not track the users’ identities, we cannot determine which participants of Table 7.1 we had to remove. From the 12 participants who submitted complete data, we collected 88,596 context events that occurred in 6,486 sessions. In total, the participants labeled 1,268 sessions as professional and 5,205 sessions as private. Therefore, about 80% of the data

Table 7.1: Overview of the study participants.

# Gender Age Country Position Affiliation Experience

P1 M 25-34 Germany Researcher University 2 years

P2 F 25-34 Germany Researcher University 3 years

P3 M 25-34 Germany Researcher University 2 years

P4 M 25-34 Germany App Developer Health App Company 4 years

P5 F 35-44 Germany Lead Engineer Telecommunication Company 10 years

P6 F 25-34 Austria Researcher Research Institute 1 year

P7 M 25-34 Austria Researcher Research Institute 4 years

P8 M 25-34 Austria Researcher Research Institute 5 years

P9 F 25-34 Spain Researcher University 3 years

P10 M 35-44 Spain Associate Professor University 1 year

P11 M 35-44 Spain Software Engineer Information Security Company 12 years P12 M 35-44 Spain Chief Product Officer Information Security Company 12 years P13 M 65-74 Spain Project Manager Information Security Company 24 years P14 M 35-44 Spain Chief Operating Officer Information Security Company 5 years P15 F 55-64 Italy Project Manager Telecommunication Company 20 years P16 M 55-64 Italy Project Manager Telecommunication Company 20 years P17 F 25-34 Tunesia Software Engineer Intern ERP & Health App Company 2 years

P18 M 25-34 Switzerland Student Assistant University 1 year

belongs to private sessions and 20% to professional sessions. On average, each participant had 540 sessions with a standard deviation of 475.

Im Dokument Requirements Intelligence : On the Analysis of User Feedback (Seite 155-160)