Location Privacy in Mobile Networks

(1)

Location Privacy in Mobile Networks

Master Thesis

presented by Andreas Ergenzinger

at the

Universität Konstanz Faculty of Sciences

Department of Computer and Information Science

1st Examiner: Prof. Dr. Marcel Waldvogel 2nd Examiner: Prof. Dr.-Ing. Oliver Haase

Constance, 2015

(2)

(3)

Abstract

This master thesis presents a client-server system for protecting the anonymity and location privacy of mobile network users against a passive network-side adversary. The server provides a pool of UICCs – technological successors to SIM cards. Users’ cellphones rely on those UICCs for ac- cessing a mobile network. Through simultaneous, coordinated switching to new UICCs, groups of users in the same mobile radio cell become indistinguishable to the adversary. The system’s performance was evaluated based on simulated vehicle traffic. An increase of the number of users led to a rise in coordinated UICC switching, higher distance errors, and a decrease of user identification ratios. The system is capable of enhancing the anonymity and location privacy of its users against a passive adversary, provided that the opponent does not track mobile phones based on their device identities.

(4)

(5)

Dedicated to my parents.

(6)

(7)

List of Figures

2.1 Command and response APDU fields . . . 4

2.2 Transmission of characters . . . 5

2.3 Smart card communication layers . . . 6

2.4 Block fields . . . 7

2.5 UMTS network architecture . . . 9

2.6 UICC form factors and pinout . . . 12

2.7 UICC and USIM file structure . . . 12

2.8 Authentication and key agreement (AKA) . . . 13

2.9 Start of integrity protection and ciphering . . . 14

2.10 Position of the CAT layer . . . 14

4.1 An example for simultaneous subscriber identity switching . . . 25

4.2 Generic card service client-server architecture . . . 26

4.3 Protocol stacks of the basic card service implementation . . . 27

4.4 3G GPRS Security procedures with a remote UICC . . . 36

4.5 Authentication procedure in 2G GPRS . . . 37

5.1 Major components of the client-side implementation . . . 41

5.2 Picture of the MIM module prototype . . . 41

5.3 Simplified state diagram of the MIM module firmware . . . 42

5.4 Protocol and technology stacks of the client-side implementation . . . 42

5.5 Components of the card client app and flow of messages . . . 43

6.1 Feasibility of attaching to the mobile network depending on card client response times . . . 45

6.2 Overlaid traces and cells . . . 47

6.3 Number of active nodes by time of day . . . 48

6.4 2D histogram of distances traveled and trip durations of the node population . . . 48

6.5 Box plot legend . . . 50

6.6 Sample size and end distance error . . . 50

6.7 Sample size and number of encounters per node . . . 50

6.8 Sample size and node identification ratio . . . 51

6.9 Probability density function of the end distance error distribution . . . 51

6.10 Distribution of the number of encounters per node . . . 52

6.11 Relationship between trip distance and end distance error . . . 54

6.12 Relationship between trip duration and end distance error . . . 55

6.13 Metro rail map showing stations and lines . . . 57

6.14 Number of riders by time of day and ride type . . . 59

6.15 Distribution of rider-specific mean distance errors . . . 59

6.16 Cumulative histogram of distance error in hops . . . 59

6.17 Relationship between mean distance error, ride duration, and ride type . . . 60

(10)

A.1 Circuit for level shifting and I/O splitting/joining . . . 72

(11)

List of Tables

2.1 Roles of universal smart card contacts . . . 5

2.2 Classes of operating conditions and high voltage levels . . . 6

4.1 Sequence of events for naive, pairwise subscriber identity swapping . . . 31

4.2 Comparison of reliable SI switching strategies . . . 35

A.1 U(S)ART instances and associated pins used by the MIM module prototype . . . 72

A.2 Gate-source threshold voltages of used transistors types . . . 72

A.3 Pin connections of the HC-06 Bluetooth module and the Arduino Due . . . 72

(12)

(13)

Symbols and Abbreviations

2G 2nd Generation

3G 3rd Generation

3GPP 3rd Generation Partnership Project A,B,C Classes of operating conditions

ADF Application DF

AKA Authentication and Key Agreement APDU Application Protocol Data Unit API Application Programming Interface ATR Answer To Reset

AUTN Authentication token

Bd Baud

BWT Block Waiting Time CGT Character Guard Time CAT Card Application Toolkit

CK Cipher Key

CLK Clock pin

CN Core Network

CS Circuit-switched CSP Card Service Protocol CWT Character Waiting Time

DF Dedicated File

DSDA Dual SIM Dual Active

EF Elementary File

etu Elementary Time Unit

f Frequency value of the clock signal

f2 Key generating function used for computing RES and XRES f3 Key generating function used for computing CK

f4 Key generating function used for computing IK GGSN Gateway GPRS Support Node

GMSC Gateway MSC

GND Ground pin

GPRS General Packet Radio Service

GSM Global System for Mobile communications

H High state

HLR Home Location Register IC Integrated Circuit ICC Integrated Circuit Card IDF Interface Device

IEC International Electrotechnical Commission

IK Integrity Key

(14)

IMEI International Mobile Station Equipment Identity IMSI International Mobile Subscriber Identity

I/O Input/Output pin IP Internet Protocol

ISDN Integrated Services Digital Network

ISO International Organization for Standardization

K Long-term secret key shared between a subscriber’s USIM and HLR

L Low state

LA Location Area

LAI Location Area Identity LAU Location Area Update LBS Location-Based Service LCS Location Service

MF Master File

MIM Neither an abbreviation nor an acronym.

MITM Man-in-the-middle

MNO Mobile Network Operator MSC Mobile Switching Center

MSISDN Mobile Subscriber ISDN Number O/D Origin-Destination

OS Operating System

OSI Open Systems Interconnection

OTA Over-the-air

PC/SC Personal Computer/Smart Card PDP Packet Data Protocol

PPS Protocol and Parameter Selection P-TMSI Packet TMSI

PS Packet-switched

PSTN Public Switched Telephone Network

RA Routing Area

RAI Routing Area Identity RAND Random challenge RAU Routing Area Update

RES Response to random challenge RFCOMM Radio Frequency Communication RNC Radio Network Controller RRLP Radio Resource LCS Protocol

RST Reset pin

RTD Round-Trip Delay

SGSN Serving GPRS Support Node SI Subscriber Identity

SIM Subscriber Identity Module TCP Transmission Control Protocol TMSI Temporary Mobile Subscriber Identity TPDU Transport Protocol Data Unit

UART Universal Asynchronous Receiver/Transmitter

UE User Equipment

UICC Universal Integrated Circuit Card —inofficial

3GPP standards specify that it is neither an abbreviation nor an acronym.

UMTS Universal Mobile Telecommunications System

(15)

USART Universal Synchronous/Asynchronous Receiver/Transmitter USB Universal Serial Bus

USIM Universal Subscriber Identity Module UTM Universal Transverse Mercator

UTRAN UMTS Terrestrial Radio Access Network

VCC Power pin

VLR Visitor Location Register WLAN Wireless Local Area Network WTX Waiting Time Extension XRES Expected response

(16)

(17)

Chapter 1

Introduction

This thesis investigates ways to protect the location privacyof mobile network users against the provider of that network. Location privacy can be defined as “the ability of an individual to move in public space with the expectation that under normal circumstances their location will not be systematically and secretly recorded for later use”[13]. This chapter justifies why that problem is worthy of study and outlines the content of the remaining chapters.

1.1 Statement of the Problem

In January 2014, 90 % of adults living in the United States owned some kind of cell phone[36].

Due to their greater versatility, smartphones are more and more replacing simple feature phones.

Between spring of 2013 and October 2014, the ratio of U.S. adults owning a smartphone rose from 35 % to 64 %, with a more widespread adoption by the younger age groups[45]. Moreover, smartphone owners were found to be very attached to their devices; 79 % reported to spend at most 2 hours per day without their mobile phone[29]. Thus, the location history of a cellphone may quite accurately reflect the movement and activities of its owner.

When using a mobile network, subscribers cannot avoid revealing personal location information to their mobile network operator (MNO). In order to be reachable for incoming calls, messages, and data, the phone must always keep the network informed about the area it can be found in. Activities that require interacting with the network, such as making a call or fetching a web- site, allow placing the phone within an even smaller region, the radio cell covered by the serving cell tower antenna. The MNO may store obtained location information about its customers for billing, analyses, or legal reasons, even though this is not necessary for providing the communication services. Since market forces usually allow only a small number of competing MNOs per country, each network operator may be able to collect location data for a large number of people.

Such comprehensive population movement data can have many beneficial and legitimate uses in disciplines like city and transportation planning, targeting of disease control measures[50], and mapping of population densities [16]. However, there is also considerable potential for misuse of this private information. It is commonly assumed, that privacy risks can only result from the theft of the data and that the MNOs themselves are trustworthy. Some, e.g. [31], believe that the potential damage to the companies’ reputation and profits that could result from violating the privacy expectations of their subscribers are sufficient to prevent any deliberate infraction on their part. Yet, there are numerous examples of MNOs from around the world monetizing the location information of their customers. These cases range from location-based advertising to selling aggregated, anonymized user demographic and movement data [28]. So MNOs do not always deserve unqualified confidence, especially not if their core business is built on the extensive analysis and exploitation of information about their users[46].

In the wrong hands, subscriber location data might be used to monitor people’s movement,

(18)

to identify their routines, to find out if they are employed, who they work for, who they are spending time with, how fast they drive, what demonstrations they participate in, and which places of worship and hospitals they visit. And even though some might feel that they have nothing to hide[47], others, through no fault of their own, might not be so fortunate. For them, location privacy might be indispensable for ensuring their livelihood, freedom, or physical safety.

1.2 Thesis Outline

The remainder of this thesis is structured as follows:

• Chapter 2 provides pertinent technical background information.

• Chapter 3 reviews scientific literature on location privacy and human mobility behavior.

• Chapter 4 proposes a system for the anonymous usage of mobile networks and describes various strategies that allow system users to change their mobile network user identity in ways that hinder re-identification.

• Chapter 5 presents an implemented prototype of the system.

• Chapter 6 describes a partial, experimental evaluation of the system’s performance, and

• Chapter 7 lists conclusions, contributions, and opportunities for future work.

(19)

Chapter 2

Technical Background

This chapter provides technical background information relevant to the rest of the thesis. Sec- tion 2.1 gives and overview over contact-based smart cards. Section 2.2 describes UMTS mobile networks and the role of a special smart card type in subscriber authentication.

2.1 Integrated Circuit Cards with Electrical Contacts

An integrated circuit card (ICC), better known as smart card, is a plastic card with an embedded microchip. Common examples for ICCs are debit cards and SIM cards. Depending on its integrated circuit (IC) component, a smart card may be either capable of data storage, data processing, or both.

An ICC is accessed via an interface device (IDF), commonly referred to as card reader or terminal. Based on the way the card and the terminal connect, two types of ICCs can be distinguished:

contactless smart cards communicate over a low-range radio interface, whereas contact cards have electrical contacts on the surface that must be electrically connected to corresponding contacts on the terminal side. All future mentions of smart cards in this thesis shall refer to the contact-based variety. Figure 2.6 on page 12 shows several smart card form factors used in mobile telephony.

The following parts of this section recap material from the ISO/IEC 7816 technical standard, more precisely from[23]and[24], in order to give the reader a basic understanding of smart card operations.

2.1.1 Files, Functions, and Applications

A smart card stores user-accessible data in a file system that is a forest data structure composed of rooted trees with two kinds of nodes: elementary files (EFs) and dedicated files (DFs). Each EF stores a set of data units, records or objects. EFs are always leaf nodes. DFs are used for grouping other files, which means that a DF is only a leaf node if it is empty. The master file (MF) is a mandatory DF found on all ICCs that can serve as the root of a file system tree. There exist various ways to address files depending on file type, for example by variable-length names or by two-byte identifiers, which may be concatenated to absolute and relative paths.

In addition to offering access to the file system, a smart card may also provide functions that operate on parameters provided by the interface device, data stored on the card, or both. Functions may return a result, update an internal state, write data to files, or perform a combination of these actions.

The hierarchical structure, the names, identifiers and purposes of files as well as the available functions are usually standardized and reflect a card’s intended purpose. Depending on the specific requirements, an ICC may implement several applications. An application consists of a set of functions and a single subtree in the file system. The subtree’s root DF is called an application DF (ADF). It can either be a descendant of the master file, or it can be a proper root node. The latter

(20)

CLA INS P1 P2 Lc field Command data field Le field Class byte Instruction

byte Parameter bytes

P2-P2 Only present

if Nc > 0 0, 1 or 3 bytes

String of Nc data bytes,

only present if Nc > 0 Only present if Ne > 0

0-3 bytes

Command header Command trailer

(a) Command APDU fields

Response data field String of Nr < Nc data bytes

only present if Nr> 0 Status bytes SW1-SW2

SW1 SW2

Response header Response trailer

(b) Response APDU fields

Figure 2.1: Command and response APDU fields

arrangement is more common for modern applications. Such ADFs are addressed by their unique DF name, which must be listed in a standardized EF under the MF. Figure 2.7 on page 12 shows an example for an ICC file system with an ADF root node.

2.1.2 Application Protocol Data Units

The IDF and the ICC interact by exchanging messages called application protocol data units (AP- DUs). An APDU sent by the terminal is called a command APDU, which answered by exactly one response APDU from the card. Both messages taken together are referred to as an command- response pair and only one exchange of such a pair may be ongoing at the same time.

Figure 2.1a shows the fields that may be present in a command APDU. The first four bytes, called the command header, are mandatory. The class byte specifies a class of commands. The code transmitted in the instruction byte denotes a specific command, telling the card which operations to perform. Command arguments may be encoded in the parameter bytes P1 and P2. N_cdenotes the number of input bytes contained in the command data field. If N_c > 0 then this value is encoded in the L_cfield. Otherwise both fields are absent. The L_efield encodes an upper bound for the size of the result data field of the expected response APDU.

Figure 2.1b shows the fields of the response APDU. If the instruction specified in the command APDU does not generate response data, then the respective data field of the response APDU is absent. The response trailer containing the status bytes SW1 and SW2 is always present. These bytes categorize the outcome of the finished operation and may inform the terminal about necessary follow-up commands.

2.1.3 Card Contacts

The contact-based interface of a standard-compliant smart card may have up to eight electrical contacts. Five of these are needed for basic functionality, i.e. their purpose and behavior is fully specified in ISO/IEC 7816. These contacts are listed in Table 2.1. The remaining three can serve application-specific functions, but they are not relevant to this thesis.

2.1.4 Transmission of Characters

The card and the terminal communicate by altering the voltage level on the shared I/O line. The IDF side is responsible for pulling the voltage on I/O up to the high state (H) by connecting

(21)

Contact Purpose

VCC Used for supplying the card with power.

RST Used for sending the reset signal to the ICC.

CLK Used for providing the card with a clock signal for synchronous communication.

GND Used for grounding and for providing a low reference voltage to the ICC.

I/O Used for half-duplex transmission of data to and from the card.

Table 2.1: Roles of universal smart card contacts

it to VCC using a (pull-up) resistor. Then either side can impose the low state (L) of 0 V on the line by connecting it directly to ground. This resulting state can easily be picked up by a receiver on the opposite side. Since the communication line is used bidirectionally, simultaneous transmissions would lead to mutual interference. The available transport protocols address this issue by specifying strict rules for turn-taking (see Section 2.1.6).

A special time unit called elementary time unit (etu) is introduced. Its duration is defined by the following formula:

1 etu= F D· 1

f

Both F and D are integer values that can be chosen by the card. The symbol f denotes the frequency of the clock signal provided by the terminal over the CLK input. The ICC selects the upper limit for f, but the terminal may provide any clock frequency within the applicable bounds and may even change it between commands.

Bytes are embedded into so-called characters for transmission over the I/O line, as shown in Figure 2.2. Each character is divided into ten equally long time increments called moments. The length of a moment is 1 etu. A characters starts with an initial moment of low state, followed by one moment for each transmitted data bit. Moment number 10 is used to encode parity information, computed over the nine preceding moments. The endianness of the embedded byte, the mapping of bit values to voltage states, and the parity type (odd or even) are determined by the card. How the receiver reacts to a transmission error, identified by an incorrect parity value, depends on the used transport protocol.

The delay between consecutive characters sent in the same direction is bounded by two parameters. The character guard time (CGT) is the minimum delay from the beginning of one character to the beginning of the next one. CGT is at least 12 etu. The maximum delay between the beginning of two consecutive characters is called character waiting time (CWT). The receiving side can use this parameter as a timeout value to detect incomplete or incorrect transmissions.

2.1.5 Card Operating Procedures

An ICC shall support one or several consecutive classes of operating conditions. Three classes exist, labeled A, B, and C. Each one is characterized by a different high voltage level that may be applied to the card’s contacts. Voltages exceeding this maximum might damage the card’s IC component. Therefore, it is common practice for an IDF to apply the classes in reverse order,

Start Byte Parity

Pause

1 2 3 4 5 6 7 8 9

Start H 10

L Character

Figure 2.2: Transmission of characters. Graphic taken from[23, Fig. 7]

(22)

Class State H Voltage Level

A 5.0 V

B 3.0 V

C 1.8 V

Table 2.2: Classes of operating conditions and high voltage levels

from lowest to highest voltage, when dealing with an ICC whose supported classes are not known.

Table 2.2 lists the class-specific voltages.

After selecting a class of operating conditions, the terminal activates the card by executing the following steps in the given order:

1. Put RST to state L.

2. Provide power to VCC.

3. Enable the pull-up resistor on I/O (ignoring any activity on the line for a specified period).

4. Provide a stable clock signal on CLK.

Following activation, the interface device is ready to receive the first message sent by the card.

The IDF triggers this transmission called answer to reset (ATR) by sending the reset signal i.e. by putting RST to state H. The ATR must begin between 400 and 40,000 clock cycles thereafter.

The answer to reset informs the terminal about the connection parameters and card capabilities. It may range in size from 2 to 32 characters, depending on how many default values are declared implicitly by omission.

Unless the ATR forbids it, the IDF may send one protocol and parameter selection (PPS) request, to suggest a different transport protocol as well as other values for F, D, and the maximum clock frequency. The card returns its binding decision in the PPS response. Then, at the latest, may both sides start exchanging APDUs using the selected protocol and parameters.

The card is deactivated by putting all interface lines back to state L. The specification dictates that these steps are to be carried out in a particular order, but deviating from that sequence should not lead to negative consequences.

2.1.6 Transport Protocols

Transmission of APDUs requires the use of a transport protocol. The technical standard defines two options: protocols T=0 and T=1¹. Figure 2.3 shows their role in the layered communication architecture.

1The protocols are named after the ATR field and the respective value used for declaring their usage.

Interface Device Smart Card

APDU

T=0: TPDU, T=1: block character

moment Application Layer

Transport Layer Data Link Layer Physical Layer

Application Layer Transport Layer Data Link Layer Physical Layer

Figure 2.3: Smart card communication layers and the smallest units of information exchanged on each layer

(23)

Prologue field (mandatory) Information field (optional) Epilogue field (mandatory) NAD (1 byte) PCB (1 byte) LEN (1 byte) INF (0 to 254 bytes) LRC (1 byte) or CRC (2 bytes)

Figure 2.4: Block fields. Graphic taken from[23, Fig. 17].

Both protocols use a request-response pattern for message exchange, i.e. the terminal and the card take turns sending transport layer messages, starting with a message from the interface device.

Protocol T=0

Protocol T=0 is described as a half-duplex protocol for the transmission of characters. APDUs from the application layer are mapped to messages referred to as transport protocol data units (TP- DUs) for transmission over the transport layer. The mapping algorithm has to consider multiple factors, such as the presence of command data, the possibility of response data, maximum field lengths, and context information.

Usually, a command TPDU is mostly identical to the corresponding command APDU. If either command data is present or response data is expected, then an additional length field is included in the TPDU header. Response APDUs generally are mapped to response TPDUs without any change. The transmission of data field bytes must be delayed until the receiving side has indi- cated its readiness to receive them by sending a so-called procedure byte. This applies to transmissions in either directions.

If an instruction is accompanied by both command and response data, then the APDUs are mapped to at least two command-response TPDU pairs, one for transmitting the command data to the card, the other one for fetching the response data. Similarly, large APDUs may have to be split up and transmitted in a sequence of chained TPDU pairs.

Protocol T=0 uses a low-level approach to error detection and correction. Transmission errors are detected by comparing the received and computed parity bit values of an incoming character. If the values differ, then the received byte is assumed to have been corrupted, necessitating a retransmission.

If the terminal detects an error this way, it finishes receiving the current response TPDU based on the number of expected bytes and a CWT timeout. Afterwards it retransmits the failed command.

The card uses a different error correction method. When detecting a parity error for a received character, it signals the terminal by putting I/O to state L for 1 to 2 etu, starting at moment 10.5±0.2 of the character frame. The transmitter checks the line state at moment 11±0.2 and, if it is low, retransmits the last character following a delay of at least 2 etu.

Protocol T=1

The standard describes protocol T=1 as a half-duplex protocol for the transmission of TPDUs called blocks. Blocks are transmitted as continuous streams of characters. Three types of blocks are distinguished: information blocks (I-blocks), receive ready blocks (R-blocks), and supervisory blocks (S-blocks). All of them adhere to the common structure shown in Figure 2.4.

The node address byte (NAD) is used in I-blocks to specify a logical channel for which the data in the information field (INF) is intended. (Each logical channel specifies a particular ongoing ses- sion involving the terminal and the card.) The size of the information field is encoded in the length byte (LEN). The protocol control byte (PCB) identifies the block type and contains type-specific fields. The epilogue field holds a checksum value computed over the prologue and information fields; the card determines whether a longitudinal redundancy check (LRC) or a cyclic redundancy check (CRC) algorithm is used. The receiver of a block can check its integrity by comparing the included checksum with the expected value. The character parity bits are ignored.

(24)

The maximum information field size in bytes that a receiver can handle is called IFS. Different values may apply for each direction. IFSC and IFSD are the IFS parameters of the card and the IDF respectively. Their maximum value is 254 bytes, due to the range of the LEN field. Only blocks with a sufficiently small information field may be transmitted.

The terminal can identify an unresponsive card based on the block waiting time (BWT), the maximum permissible delay between a block sent by the IDF and the block sent in response by the ICC (plus 10 etu).

I-blocks are used for transmitting command and response APDUs. To guarantee the reliable transmission of application layer data, I-blocks have a one-bit send-sequence number field denoted N(S). Each side has its own N(S) counter, which is flipped each time the successful reception of a transmitted I-block is acknowledged by the receiver. The most common way to send such a positive acknowledgment is by directly sending an I-block oneself.

If the APDU’s length is less than or equal to the respective IFS, then the whole application layer message can be transmitted as INF of a single I-block. Otherwise the APDU is split into multiple chunks and sent in a sequence of chained I-blocks. All but the last I-block of a chain must be positively acknowledged by an R-block.

R-blocks may also signal negative acknowledgment, indicating the reception of a corrupted or incomplete block, which usually leads to a retransmission. In some cases, they might be used only for returning the right to transmit to the receiver.

S-blocks are control messages. They can be used to reset the N(S) counters in order to recover from loss of synchronization, to update IFS values, to abort an incoming chain of I-blocks, and for negotiating a waiting time extension (WTX).

The card can send a WTX request to avoid a BWT timeout, for example when a received command is going to take a long time to process. The terminal shall acknowledge the request with a WTX response, which restarts the timeout countdown. The ICC may demand longer waiting time intervals and may issue an unlimited number of repeated WTX requests. Although there exists no option for rejecting those requests, the terminal cannot be kept waiting indefinitely. If a response APDU is delayed for too long, then the IDF may respond with a card reset due to non-compliance with response time requirements of the application layer.

2.2 3G Mobile Networks and UICCs

As heralded by the title, this section presents aspects of Universal Mobile Telecommunications System (UMTS) cellular networks and the role of UICCs, colloquially referred to as SIM cards.

Although the focus is on 3rd generation (3G) technologies, due to the gradual evolution of mobile telecommunication systems, most of the concepts described here apply to other generations as well. For a more thorough introduction on mobile networks, see[42].

2.2.1 UMTS Mobile Networks

A mobile (or cellular) network provides wireless communication services to untethered devices through a set of spatially distributed, interconnected, stationary antennas. Figure 2.5 outlines the architecture of UMTS mobile networks. The system consists of two parts: the UMTS terrestrial radio access network (UTRAN) and the core network (CN).

The stationary antennas belong to the UTRAN. Antenna stations are referred to as node Bs.

They can establish a radio connection to the user equipment (UE), which may be a mobile phone, a broadband adapter, or any other device that relies on the cellular network for communication.

The radio resources of node Bs are managed by radio network controllers (RNCs). According to[42], each RNC is usually responsible for several hundred node Bs.

(25)

UTRAN Core network (CN)

Node B UE

IP network PSTN HLR

GMSC

GGSN HLR

RNC

MSCVLR/

SGSNVLR/

Figure 2.5: Simplified UMTS network architecture (Release 99). Dashed lines represent signaling links. Solid lines represent links for both data and signaling. Graphic based on[42, Figure 3.1].

The services provided by the core network can be divided into two domains: circuit-switched (CS) and packet-switched (PS) services. CS means that the network allocates an exclusive, dedicated communication channel with a guaranteed bandwidth. Until that channel is explicitly closed again, the transmission resources claimed by the connection are unavailable to other parties. CS connections are used for voice calls and video telephony, i.e. applications with (mostly) constant bit rates. For applications with variable bit rate requirements, e.g. mobile Internet, PS connections are better suited. Instead of reserving transfer capacity exclusively for one connection, available bandwidth is shared by multiple connections. Since PS applications typically send data in short bursts separated by relatively long periods of inactivity, relying on circuit switching would lead to a lot of unused, wasted transmission capacity.

CN nodes involved in relaying CS data are the mobile switching center (MSC) and the gateway MSC (GMSC). The MSC is responsible for setting up calls to and from UEs. The GMSC connects the mobile network to the public switched telephone network (PSTN) – the global telephony network. PS connections are managed by nodes belonging to the General Packet Radio Service (GPRS): the serving GPRS support node (SGSN) is the counterpart of the MSC in the PS domain, the functions of the gateway GPRS support node (GGSN) mirrors that of the GMSC, with the difference that the GGSN serves as an interface to the Internet or another Internet Proto- col (IP) network. Two types of databases with subscriber account information – the home location register (HLR) and the visitor location register (VLR) – are also part of the core network. Each mobile network operator (MNO) maintains an HLR for storing information about the accounts of its subscribers. That information includes the international mobile subscriber identity (IMSI) – a permanent subscriber identifier – and the mobile subscriber ISDN number (MSISDN) – the telephone number under which the subscriber may be reached. The HLR also stores more transient data, such as which MSC is currently responsible for forwarding incoming calls to a particular subscriber. That MSC then copies some of the subscriber-related information to its VLR where it is kept until the subscriber is handed over to a different MSC.

Establishing a mobile data connection from the UE to the IP network is referred to as a packet data protocol (PDP) context activation or as making a PS call. During this procedure, the UE is temporarily assigned an IP address. That address is usually from a private address space, which requires the GGSN to perform network address translation (NAT). But this also means that there is no competition for IP addresses among UEs, allowing the PDP contexts to be relatively long- lived.

In order to join a mobile network, the UE has to prove that it is acting on behalf of a particular subscriber. Therefore, the network may demand from the UE to perform an authentication

(26)

procedure that requires access to the subscriber’s UICC – a special smart card that needs to be plugged into the UE. Authentication may happen whenever the network desires, however, it usually coincides with billable events, such as outgoing or incoming calls, and location updates (see below). Most MNOs also perform the authentication procedure when a subscriber connects to their network. The procedure is described in the next subsection.

Each antenna of a node B covers a specific region called a cell. Cells are uniquely identified by their cell ID. Normally, directional antennas are used, which allows each node B to service multiple surrounding cells. Whenever a UE attaches to the mobile network, it registers with a cell that covers the UE’s location to tell the network where the subscriber using the UE can be found. Having UEs repeat this location update procedure every time they enter a new cell would lead to a lot of signaling traffic. Therefore, adjoining cells are arranged in groups and an idle UE has to perform a location update only when is moves into the area of a different group of cells or when the amount of time since the last location update has exceeded a specific value. For the CS domain, the areas of grouped cells are called location areas (LAs). For the PS domain, cells are grouped into routing areas (RAs). The two types of update procedures are called LA update (LAU) and RA update (RAU) respectively. The cells of a particular RA all belong to the same LA.

According to[42], LAs typically contain several dozens of cells and RAs are usually identical to LAs, without further subdivision. If there is an incoming call to an idle UE (whose current cell is therefore unknown) the network has to first broadcast a paging request in all cells of the UE’s last reported LA. After the UE’s response, the call can be established in its present cell. While the call is ongoing or while the UE is in a state with similar requirements, the UE constantly monitors the signal strength of nearby network antennas and shares this information with the RNC responsible for the current node B. For better reception or load balancing, that RNC may instruct the UE to switch to a different cell. Since neighboring cells partially overlap, an almost seamless handover between them may be possible.

Within the core network, subscribers are identified by their IMSI. On the air interface, UEs and node Bs use temporary identities instead, to prevent third parties that may be listening to unencrypted control messages from identifying and tracking subscribers. The network assigns a random, frequently changing identifier called the temporary mobile subscriber identity (TMSI) to each subscriber. The TMSI changes with each LAU. For the PS domain, there exists an analogue transient identifier called packet TMSI (P-TMSI), which is renewed during each RAU. Confidential- ity of the new identities is always ensured by encrypted transmission. The necessary encryption keys are generated as part of the subscriber authentication procedure. When identifying with a TMSI or P-TMSI outside of the LA and RA respectively where it was assigned, the UE must also supply the original location or routing area identity (LAI, RAI). The MSC or SGSN of the new cell can then contact its respective counterpart responsible for that area and retrieve the IMSI to which the temporary identity is mapped. If the UE cannot provide a valid TMSI or P-TMSI or if the subsequent lookup fails, then an LAU or RAU with the subscriber’s IMSI has to be performed, during which a new temporary identity is assigned.

Each UE has a unique permanent identifier, the international mobile station equipment identity (IMEI). It plays no role in the provided communication services and the network cannot verify its authenticity, but the IMEI is used for blocking stolen or disruptive devices. It also permits the MNO to monitor UEs for law enforcement purposes.

2.2.2 UICCs and Security-Related Procedures

A UICC is an integrated circuit card with one or more applications. The term is supposed to be neither an abbreviation nor an acronym. The UICC is the technological successor to the SIM card, i.e. it is supposed to store subscriber data necessary for participating in a mobile network. This thesis focuses on UICCs with a USIM application, which enables the subscriber to make use of a 3G mobile network. The properties of UICCs are defined in[18]. Figure 2.6 shows the four

(27)

standardized UICC form factors and their pinout.

The USIM Application

The universal subscriber identity module (USIM) is a smart card application for UICCs that permits a UE to access a UMTS mobile network. The properties of the USIM are specified in[2].

The application stores both fixed and dynamic information about the subscriber and the network in a predefined file structure. Figure 2.7 shows select elementary files under the USIM ADF. These EFs represent less than two percent of the USIM application’s file tree.

For performance reasons, the UE may cache mutable parameters in its own memory instead of immediately writing changes to and always reading the current values from the card. If the UE does not adhere to initialization and termination procedures required by the application or in case of an error in the network, then the UICC, the UE, and the stationary network could end up in inconsistent states. For example, a sudden removal of the card from the terminal while the UE is attached to the mobile network’s CS domain would lead to a loss of the current TMSI on the subscriber side, because the UE would not have time to save the current value on the UICC for later use. The cryptographic keys mentioned in the next paragraph may be lost in the same way.

UMTS provides re-synchronization procedures for recovering from these situations.

Security Procedures

The security features of UMTS include the mutual authentication of subscribers and of the network as well as ensuring the confidentiality and integrity of the transmitted data. Authentication is based on both sides proving possession of a secret key K. Confidentiality is ensured by symmetric encryption of transmissions based on a cipher key CK. And data integrity is provided by digital signing of messages based on an integrity key IK. This allows the receiver to identify modified or altogether forged messages.

Mutual authentication and the establishment of CK and IK is achieved in a single procedure called authentication and key agreement (AKA). Figure 2.8 shows the sequence of events of a successful authentication in the CS domain. Except for the replacement of the MSC by the SGSN, the procedure would be identical in the PS domain. Note that each domain uses a different set of cipher and integrity keys. The authentication procedure begins with the serving MSC (or SGSN) selecting the first unused element from a subscriber-specific list of so-called authentication vectors, which are stored in the node’s VLR. If the list is empty, then new authentication vectors are re- trieved from the subscriber’s HLR, as shown in the upper part of Figure 2.8. Each vector contains a random challenge RAND and several derived values: the expected response XRES, keys CK and IK, and an authentication token AUTN. Besides RAND and a set of generating functions labeled f1 to f5, the subscriber’s secret key K is needed to compute the derived values. The subscriber’s HLR and USIM implement the same generating functions and they are they only entities with knowledge of K. AUTN encodes a sequence number that can be used to detect old authentication vectors.

After selecting the next authentication vector from its VLR, the MSC forwards RAND and AUTN to the UE, demanding the challenge response RES. Reading K off the UICC is supposed to be impossible². The UICC shall allow only indirect access to K via theAUTHENTICATEcommand.

This instruction takes two input parameters: RAND and AUTN. The USIM first verifies AUTN to make sure that the challenge originated from a trusted network with access to recent, unused authentication vectors. It then computes and returns RES, CK, and IK.

2The general goal is to make extracting the secret key at least very difficult. Yet, several successful attack strategies have been reported, such as probing data bus wires[51], exploiting software vulnerabilities of the UICC[39], and differential power analysis[57].

(28)

ID-1 UICC

Plug-in UICC

Mini-UICC

4FF

C1C2 C3C4

C5C6 C7C8

C5C6 C7 C8 C1C2 C3

C4 C1C2 C3C4

C5C6 C7C8

C5C6 C7C8 C1C2 C3C4

C5C6 C7C8

(a) UICC form factors

Contact Pin label name

C1 VCC

C2 RST

C3 CLK

C4 not used

C5 GND

C6 not used

C7 I/O

C8 not used

(b) UICC pinout

Figure 2.6: UICC form factors and pinout

Master file '3F00'MF

Contain files for GSM network access by UEs that only support SIM cards

EFPL

'2F05' EFARR

'2F06' EFICCID

'2FE2' UICC serial number List of ADF names Language settings File access rules

DFTELECOM

'7F10' DFGSM

'7F20'

USIM application dedicated file

TMSI and LAI P-TMSI and RAI

IMSI Ciphering and MSISDN

integrity keys for the CS domain

Ciphering and integrity keys for the PS domain

EFMSISDN

'6F40' EFLOCI

'6F7E' EFPSLOCI

'6F73' EFKeys

'6F08' EFKeysPS

'6F09' EFDIR

'2F00'

EFIMSI

'6F07' ADFUSIM

Reference by name

Figure 2.7: UICC and USIM file structure. Only select files are shown, including their names and hexadecimal file identifiers. Graphic based on[2, Figures 4.1 and 4.2].

(29)

UICC/USIM UE MSC/VLR HLR

Select CK(i) and IK(i) Verify AUTN(i)

RES(i) CK(i)

IK(i) ← f3(K, RAND(i))

← f4(K, RAND(i))

← f2(K, RAND(i))

Distribution of authenticationvectors from HLR to VLR

Generate authentication vectors AV(1...n)

Store authentication vectors

Compare RES(i) and XRES(i) Select authentication vector AV(i) = (RAND(i), XRES(i), CK(i), IK(i), AUTN(i))

Select CK(i) and IK(i) AUTHENTICATE response

RES(i), CK(i), IK(i) User authentication response RES(i)

Authentication data request

Authentication data response AV(1...n)

User authentication request RAND(i), AUTN(i) AUTHENTICATE command

RAND(i), AUTN(i)

Authentication andkey agreement

Figure 2.8: Authentication and key agreement (AKA) in the CS domain with preceding retrieval of fresh authentication vectors from the HLR. Graphic based on[3, Figure 5].

The UE passes RES on to the MSC for comparison. If RES and XRES match, then the subscriber is successfully authenticated and both sides prepare switching to CK and IK. To actually start ciphering and integrity protection for the CS domain or to update the used keys, the MSC has to perform another exchange with the UE as shown in Figure 2.9. Afterwards, the path between the UE and its serving RNC is protected (assuming that neither side insists on weak or no security).

RAND, AUTN, and RES may be sent as cleartext over the air interface. But since user data is only ever sent after ciphering and integrity protection have begun, the user data is considered safe, even against an interposed attacker, provided that the UE does not fall back to 2G networks, as pointed out by[35].

The UE shall send the user authentication response less than 1 s after receiving the user authentication request. Processing of theAUTHENTICATEcommand by the UICC shall take less than 500 ms.

Card Application Toolkit

Card application toolkit (CAT) is an optional but typically implemented set of mechanisms in the UICC and the UE that allow applications on the card to not only respond to incoming instruc-

(30)

UE RNC MSC/VLR Security mode command

CK, IK Start integrity protection

Start ciphering and deciphering Verify received message and

start integrity protection

Verify received message

Security mode complete Security mode command

Security mode complete

Start ciphering and deciphering

Figure 2.9: Start of integrity protection and ciphering in the CS domain following authentication and key agreement. Graphic based on[3, Figure 14].

tions, but to issue certain commands themselves. CAT is defined in[19]. UICCs that support it are referred to as proactive. A proactive UICC that is plugged into a compatible UE may trigger a wide range of activities, such as making phone calls, sending short messages, drawing responsive menus on the UE’s screen, and communicating with other UICCs connected to the same UE. The terminal side may support only some of the associated commands; the UE sends a list of available instructions to the UICC during the USIM initialization procedure in the so-called terminal profile.

Figure 2.10 shows the position of the CAT layer within the UICC communication layer stack.

Proactive UICCs are constrained by the half-duplex, turn-taking transmission protocols used by smart cards. They can only issue a command, after having finished processing a terminal command without errors. Instead of returning the status value that indicates normal completion of the terminal’s command, a UICC with a queued proactive command sends a special status value, which signals both the successful termination of the terminal’s last instruction as well as the length of the information describing the pending proactive command. The terminal then fetches that data and ultimately returns the response using special terminal commands.

Terminal UICC

APDU

T=0: TPDU, T=1: block character

moment Transport Layer

Data Link Layer Physical Layer Application Layer APDU

CAT Layer

Transport Layer Data Link Layer Physical Layer Application Layer

CAT Layer

Figure 2.10: Position of the CAT layer in the UICC communication layer stack. The standard smart card communication layers shown previously in Figure 2.3 are colored gray.

(31)

OTA Programming of UICCs – an Excursion

From time to time, an MNO might want to reconfigure issued UICCs. To avoid costly recalls and replacements, a UICC may support over-the-air (OTA) updates, i.e. changes of the data stored on the card via commands sent by the MNO over the mobile network. For example, the network operator might use OTA programming to distribute a new UICC application.

A fairly recent area of application for OTA updates is reprogramming UICCs to switch to a different provider without having to replace the UICC in the phone. Apple was the first company to introduce cards with this feature. According to [8], their proprietary Apple SIM solution is likely to be replaced by the embedded SIM (eSIM)[49]industry standard, which pursues the same goal.

(32)

(33)

Chapter 3

Related Work

This chapter reviews existing scientific literature relevant to this thesis. Section 3.1 introduces fundamental concepts and presents various approaches for protecting the location privacy of individuals. Section 3.2 presents findings from research aimed at characterizing human mobility.

3.1 Location Privacy

The existing literature offers two noteworthy definitions oflocation privacy, which are both more practical than the rather subjective definition given in Chapter 1. According to Beresford and Stajano, location privacy is

“. . . the ability to prevent other parties from learning one’s current or past location.”[12]

These authors consider location privacy a special type ofinformation privacy, which, according to Banisar and Davies[6], is one of the four facets of privacy, the other three being bodily privacy, privacy of communications, andterritorial privacy.

The second definition of location privacy comes from Duckham and Kulik, who narrow down the definition of privacy by Westin – a pioneering author on the subject – to apply exclusively to the location privacy domain. Thereby, Duckham et al. effectively define location privacy as

“. . . the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent[location]information about them is communicated to others.”[17], adapted from[56].

Compared to the first explanation of the term, this one accounts for more fine-grained control of individuals over the nature of shared location information and over the circumstances under which that sharing takes place.

Duckham et al. use their definition in[17]to put forward a comprehensive classification of approaches for location privacy management. They distinguish four classes of protection strategies:

regulatory approaches, privacy policies, anonymity, and obfuscation. These classes are not mutu- ally exclusive, i.e. a system may feature strategies from any combination of classes. Regulatory strategies are legal rules for collecting and handling location information. Privacy policies refer to automatic ways for users to interdict certain uses of their personal data. For example, when using a location-based service (LBS) to find nearby abortion clinics, a user might specify that information about the query shall not be shared with third parties. Anonymity strategies attempt to hide individuals’ identities, either by using pseudonyms in favor of true identities, or by withholding identity information altogether. Lastly, obfuscation strategies are aimed at implementing theneed- to-knowprinciple with regard to users’ location information, i.e. degrading the quality of released spatial and temporal information to what is absolutely necessary for a requested service.

(34)

Assessing the effectiveness of anonymity strategies requires an objective definition of the term anonymity. Pfitzmann and Hansen define it as follows:

Anonymity of a subject means that the subject is not identifiable within a set of subjects, theanonymity set.[41]

The minimum anonymity set sizekis a common metric for evaluating anonymity strategies. The termk-anonymityis often used interchangeably. Gruteser and Grunwald define this concept with regard to location privacy as follows:

“ . . . a subject[is]k-anonymouswith respect to location information, if and only if the location information presented is indistinguishable from the location information of at leastk−1 other subjects.”[22]

User-specific location information is usually modeled as a set of discrete events. The generic properties of each event can be represented as an event tuple (u,x,y,t), which specifies the 2D coordinatesxandyof a useru at timet.

In[22], Gruteser and Grunwald propose a system for protecting anonymous users of an LBS against identification and location tracking by guaranteeingk-anonymity for an arbitraryk. Their system relies on the obfuscation strategies ofspatial andtemporal cloaking. Cloaking algorithms reduce the accuracy of a numerical parameter by replacing its actual value with an enclosing interval. Thus, spatial cloaking replaces 2D coordinates with a surrounding area and temporal cloaking degrades moments to time periods. The obfuscation is performed by a middleware system, which, after cloaking, forwards location-based requests from each mobile nodes to the desired LBS server.

The default obfuscation algorithm uses known locations of the requester and other nodes in a tech- nique based on quadtrees, see[20], to compute an area that contains at least k nodes at the time of the request. If higher spatial accuracy is required, i.e. there exists an upper bound for the area’s size, and delays are acceptable, temporal cloaking can be added to the process. This means that a request at timetwill be delayed until timet₂, when additional requests byk−1 other nodes from a suitable area have occurred. The forwarded request contains the surrounding area and the time interval[t₁,t₂], witht₁<t. As pointed out by the authors, requests with overlapping areas may leak information about the locations of some active nodes, so the effective sizes of their respective anonymity set might be lower than the chosen parameterk. The authors also acknowledge that an attacker may use fake requests to counteract the cloaking algorithm and suggest authentication of nodes by the middleware with difficult-to-obtain authentication keys as a solution. However, this would not only force nodes to disclose more information about themselves, but I believe that this also could not reliably stop a dedicated attacker with sufficient resources.

Since guaranteeing a specific minimum level of location privacy is often incompatible with practical requirements, most privacy-enhancing systems only pursue a best-effort approach. The following systems presented in this section all belong to that category.

The just mentioned system of Gruteser and Grunwald is obviously only compatible with location-based services that do not have to distinguish between different users. Beresford and Sta- jano present a framework in[12], which supports services that require users to provide identity information. In the investigated scenario, each LBS defines anapplication zonein which it offers its service. These areas are assumed to be of limited size, for example the inside and surroundings of a brick-and-mortar business, that operates the location-based application to reach potential customers. Users have to register for each application at a common middleware system and then keep that system apprised of their current location. At regular intervals, the middleware evaluates users’

positions, notifies concerned applications about changed user locations and forwards buffered messages. A user’s location information is only shared with registered applications and only while the user is within the respective application zone. Despite of the limited size of these areas, colluding

(35)

LBS operators could combine their knowledge to determine a user’s past and current locations, even if a different, but static, pseudonym is used for each application. Thus, the middleware system assigns a new, unused pseudonym to a user (discarding the old one) whenever (and only when) that user enters a so-calledmix zone. The authors define this type of area as follows:

Given a group of users, a mix zone is “a connected spatial region of maximum size in which none of these users has registered any application callback.”[12]

This means that applications cannot directly observe users’ pseudonym switches. However, the authors show that the number of users present in a mix zone, while a particular user passes through it, i.e. the size of the anonymity set, is not an accurate indicator for that user’s gain in location privacy.

That is because not all matchings of old to new pseudonyms are equally likely, or even possible.

Therefore the authors suggest using Shannon’sentropy³[44]instead of the average anonymity set size for measuring a mix zone’s effectiveness.

In[30], Li et al. describe two pseudonym switching schemes –SwingandSwap– for a mobile network consisting of mobile nodes that connect to stationary access points for some kind of service. Adversaries are assumed to know the exact locations of transmitting nodes. Pseudonyms and associated authentication keys for the nodes have a limited lifetime and are obtained in bulk from a trusted third party reachable via the network. InSwing, nodes that want to increase their degree of anonymity wait until they are about to change their current direction of movement or speed. Then they inform their nearby peers about their intention with a broadcast message, enter a random silent period, i.e. they make no transmissions for a space of time with random length, and then switch to the new identity. The broadcast may cause nearby, receiving nodes to also perform the update procedure. Since silent periods lead to a transient disruption of service, frequent pseudonym updates are avoided.Swapis similar toSwing, but pseudonyms are exchanged and then used by the new holder, instead of being discarded, so the adversary might be unable to notice the updates. Additionally, nodes that are close-by, but not participating in the exchange, also enter a random silent period, to further increase the size of the anonymity set. Using an entropy- based anonymity metric in simulations with a random way-point mobility model [38], Swing performed better than randomly-triggered update procedures, but worse thanSwap. The schemes of Li et al. permits nodes to change pseudonyms anywhere, not just in designated mix zones, but the requirement of nodes having to predict their own movement greatly limits the applicability in situations with human-generated node movement. Even if nodes could successfully anticipate their changes in direction and speed, then one should assume that the adversary has similar capabilities and can use them to discover updates and match pseudonyms to users. Since the authors chose a very unrealistic model of node mobility for their evaluations, the obtained results might also be of low significance.

Anonymization of location traces is not always performed in real-time, i.e. while the movement data is being generated, but there is also research into providing location privacy for com- pleted datasets. Due to legal and ethical requirements, user identities and information that might allow the reidentification of individuals often must be removed before passing a dataset on to third parties. Compared to real-time methods, deferred algorithms should be able to produce better results, because they can take all subsequent events into account, when processing a particular event, not just those that happened before. In [32], Mano et al. develop such an algorithm for retroactive anonymization of movement data. First, the algorithm computes a simplified graph representation of users’ movement paths. The graph contains one vertex for each user’s start and end location, which are supposed to be identical to represent the user’s exclusively used home.

3Given a single mix zonez₀and multiple adjacent application zonesz₁, . . . ,z_n, letp_{i j}denote the probability that any particular user moved fromz_itoz_j from one update period to the next. Then the entropyhof the mix zone is defined ash(z₀) =−Pⁿ

i=0

Pn

j=0

p_{i j}·log₂p_{i j}.

Location Privacy in Mobile Networks