Simulation of SI Switching Based on Direct Encounters By Users of Public TransportPublic Transport

A System for Protecting Location Privacy in Mobile Networks

6.3 Simulation of SI Switching Based on Direct Encounters By Users of Public TransportPublic Transport

This section investigates the potential location privacy benefit of SI switching based on direct en-counters to users traveling by public transport, more specifically, by subway. Results were obtained with a custom simulation software that incorporated origin-destination (O/D) ridership data of the Washington D.C. metro system[34].

6.3.1 Preprocessing and Characterization of the Input Dataset

The O/D dataset specifies the number of subway riders in the month of October 2014, aggregated by departure period (quarter hour time slots), entry and exit station, and type of day (weekday, Saturday, or Sunday). All passengers were required to swipe in and out respectively at the stations they entered and exited the metro system. The system had six lines serving a total of 91 stations, as depicted in Figure 6.13. From Monday to Thursday, stations were open between 5 a.m. and mid-night. On Fridays, that period lasted until 3 a.m. of the following day. For the simulation, only weekday rides between 5 a.m. and midnight were considered. Travel times between stations be-longing to the same line were determined based on timetables and information from the metro trip planning website. Linear distances between stations were computed with Vincenty’s formulae[54]

based on geographic coordinates obtained from Google Maps.

6.3.2 Simulation

Since the ridership dataset only provides aggregated flow information, it lacks microscopic move-ment data that would be necessary for determining short-range encounters between users. There-fore, SI switching by the travelers was evaluated in a partially abstract way. Instead of simulating encounters between pairs of individual mobile nodes, contacts between nodes and rider flows as well as distances between the respective destinations were studied.

The subway system was modeled as a directed multigraph, with stations as vertices and one edge for each metro line connecting two adjacent stations. Time was discretized into minutes.

Paths in the network were chosen based on a shortest travel time. In case of multiple shortest routes between an origin and destination, all alternatives were equally likely. Since no comprehensive timetables were available, trip durations were calculated as follows. Riders were assumed to spend 5 minutes in their station of entry and in each station where they switched to a different line. Two minutes were allotted for leaving the exit station. The time spent in transit was the sum of the travel times associated with each visited edge. All trips that would have ended after the end of the simulation period were discarded.

The simulation comprised two phases. In the first phase, flows were generated. Given an entry in the preprocessed O/D dataset withxriders departing from stationowith destinationd during departure periodt, then for each minute oft,xdistinct fastest paths fromotod were computed.

For all vertices and edges of the network, the number of riders passing through at any particular minute and their respective destinations were saved as flow information.

In the second phase of the simulation, one trip for each passenger mentioned in the prepro-cessed O/D dataset was simulated, with a random starting time in the specified departure period.

The distance in meters and in network hops between each rider’s exit station and the destinations of encountered flow units were computed as the main outcome of the simulation.

Each simulated ride was classified as one of the following three types: towards center,through center, oraway from center, depending on the change of theclosenesscentrality values of the vertices along the path. Based on the duration d of the fastest route between any two elements from the set

10 km

Figure 6.13: Metro rail map showing stations and lines

of stationsV, the closeness C of a stationv was defined as:

C(v) = 1

w∈Vd(v,w)

Rides for which the closeness was monotonically increasing from origin o to destination d and which satisfied C(d) > C(o) were classified as going towards the network center. Those with a monotonic decrease and with C(d)<C(o)were classified as going away from the center. All other rides were classified as the through center type.

6.3.3 Results

Figure 6.13 shows the number of riders by time of day and ride type. All ride types show a peak in ridership during the morning and evening rush hour periods. During the morning peak, there are more passengers traveling towards the network center than away from it. In the evening, that pattern is reversed.

Figure 6.15 displays the distributions of riders’ mean end distance error for each ride type.

(The means were computed individually for each passenger.) The figure shows that rider who travel towards the network center are less likely to have a high mean distance error than those traveling away from the center.

Figure 6.16 shows the distribution of the distance error in hops. The hop distance ofv,w∈V is the minimum of graph edges that need to be traversed to get fromv tow. The figure shows that, the destinations of fellow passengers encountered by someone traveling towards the center tend to be closer to the person’s own exit station than for the other ride types.

Figure 6.17 shows the relationship between the mean metric distance error, the duration of a rider’s trip, and the ride type. Overall, riders with longer trips have larger mean distance errors.

However, this does not hold for those traveling towards the center. For them, the range and mean of the duration-specific distribution both decrease as trips get longer. The positive linear correla-tion is greatest for the group traveling through the network center.

0 250000 500000 750000

6:00 9:00 12:00 15:00 18:00 21:00 0:00

Time of day [h:min]

Figure 6.14: Number of riders by time of day and ride type

0.00000

Figure 6.15: Distribution of rider-specific mean distance errors

0.0

Figure 6.16: Cumulative histogram of distance error in hops

25 50 75 100

anytowards centerthrough centeraway from center

0 10000 20000

Mean distance error [m]

Ride duration [min] 1

20 403 8103 Count

Figure 6.17: Relationship between mean distance error, ride duration, and ride type

6.4 Discussion

The simulation of pairwise SI switching based on cell-level encounters by users of private trans-port identified sample size as the only relevant independent variable. Neither trip duration, nor distance traveled, nor the switching partner selection strategy had any systematic effect on the studied performance parameters. So, the rate of adoption – the number of users relative to the total population – would have a strong influence on the degree location privacy offered by of the system.

For sample sizen=3 000, approximately 3.4 % of the node population, the end distance error of 9 out of 10 sampled nodes exceeds 550 m. This limit converges towards roughly 950 m for larger n. At least for inner city regions, this amount of distance error should usually translate to incorrect destination cell guesses by the adversary. Since the location traces represent early morning traffic, the studied nodes probably tend to go towards the city center, i.e. the distance errors are smaller than they would have been with traces covering the whole day.

For n= 2 000, about 2.3 % of the node population, half of the sampled nodes were involved in 10 or more encounters and the adversary was able correctly identify approximately 9.9 % of the nodes at the end of their trip. For larger sample sizes, the identification ratio converges towards zero.

With some additional effort, the adversary might be able to correctly match old to new SIs with a probability above chance level. Some matchings may be ruled out completely because the resulting cell transitions would require unrealistic node velocities. The adversary could also use relative probabilities for more accurate matchings. For example, given the directions and velocities of nodes before an encounter, the matching with the smallest changes to those parameters might be considered the most likely. This would require a model of node mobility from which the probabilities of particular outcomes can be derived. Alternatively, the adversary could compute conditional probabilities of a subscriber’s cell transitions, given one or more previously visited cells, similar to[12]. But since users might have more than one encounter in each visited cell, finding maximum likelihood matchings would be non-trivial. Even if the adversary was better than chance (but not perfect) at correctly matching old to new SIs, due to the high number of encounters per node, the mean identification ratio would probably still be relatively low.

However, these results might not be fully applicable to the real world. Some cells in the periph-ery of the simulation area appear unusually large, even when assuming that the outer cells do not extend to the map border. Since cell towers are usually positioned to achieve a uniform number of subscribers per cell and because the population densities do not vary that much between the downtown area and the suburbs⁸, these differences in size suggest that the used cell tower location dataset is incomplete. Therefore, the number of cells is underestimated by the simulation, which has different implications depending on sample size. For smalln, the larger cells lead to a tendency to overestimate the number of encounters, because the probability that a newly-entered cell con-tains a switching partner increases noticeably. For very largen, cells are more likely to be occupied either way, irrespective of cell size. But if the cells are bigger, the number of crossed cell boundaries is reduced. This contributes to a tendency to underestimate the number of encounters if the node density is very high. Concerning this matter, the simulated sample sizes probably do not qualify as very large.

With the cell-based SI switching trigger, encounters can only occur when a node enters a new cell. Thus, the number of cell transitions would be an upper bound for the total number of en-counters in the real world. Since the simulation also treats each node’s initial appearance in its first cell as a transition, the maximum number of encounters in the simulation is equal to the sum of the number of cell transitions and the sample size. This also contributes to a higher node identification

8According to[26], population densities of the downtown district and Cologne’s other districts differ by a factor of less than 6.5.

ratio in the simulation.

Due to filtering of partial trips at the beginning and the end of the simulation period, the simulated number of encounters per node is reduced compared to the full 24 hour dataset.

Since the simulation treated inactive nodes as absent, encounters with other nodes only happen while a particular node is traveling. The average trip duration is about 10 minutes, one tenth of the simulation period. This concentration of encounters means that the adversary’s probability of correctly re-identifying simulated users at the end of their trip is significantly decreased, but at the expense of the currently inactive nodes, which could be re-identified easily during their periods of inactivity due to their fixed subscriber identities. In reality, nearby, immobile users would be potential switching partners, too. Under the fair switching partner selection strategy, inactive users would be more likely to be chosen as partners, because their moving peers have at least one encounter for each of their personal cell transitions, which means that the time since a particular user’s last encounter would tend to be shorter for traveling individuals.

Not simulating inactive nodes is also the likely reason why there were no apparent differences between the outcomes for the random and fair switching partner selection strategies.

When considering a greater population, including users who do not travel during the simu-lation period, then the encounters triggered by the sampled users would be distributed across an even larger group of people. Hence, the average number of encounters per user in the simulation period would be lower still, resulting in a lower node identification ratio.

The described SI switching strategy, which, in the real world, triggers at most one pairwise encounter for each cell transition, might not generate a sufficient number of encounters to suc-cessfully hide the identities of all users and thereby protect their location privacy. Hence, a dif-ferent switching strategy would probably be needed, which allows for more than two clients per encounter, uses a different SI switching trigger (possibly not depending on movement), or both.

Although pairwise SI switching based on direct encounters was not simulated as part of the private transport scenario, the number of SI switches one could expect in that case would be much lower than for the cell-level strategy due to the considerably smaller encounter radius. Conse-quently, performance measured with the other two error metrics – end distance error and node identification ratio – would be significantly worse, too. Switching based on direct encounters would be much better suited for settings featuring shared spaces with high user densities, as in the simulated subway system.

That simulation did not actually study any of the SI switching strategies proposed in Sec-tion 4.3. Instead, it ascertained the distribuSec-tion of the distances between the destinaSec-tions of partic-ular riders and those of all fellow passengers they could have met during their trip. Although the durations of the simulated trips for a given entry and exit station pair was shorter than the mean values from the O/D ridership dataset due to usually shorter, constant-length waiting times in the simulation, these differences probably were of little consequence.

Not surprisingly, the results suggest that riders who mingle with travelers going in many dif-ferent directions, e.g. at stations in the network center, can expect larger distance errors than riders who follow the same route as the majority of people they come into contact with. Overall, only about 10.3 % of the fellow travelers met by any particular rider had the same exit station. So, when used by travelers in a subway system that is comparable to the simulated one, an SI switching strategy based on direct encounters would probably lead to high distance errors for most users.

The rush hour peaks in ridership had to be expected. But due to the probably higher train frequencies and more carriages per train during peak periods, a rise in the total number of travelers might not be accompanied by an identical increase in train occupancy rates. Nonetheless, the person density would be increased during rush hour, so the number of direct encounters between users of the SI switching system should be higher as well.

Since the mobile network studied in Section 6.1 tolerates delays of user authentication re-sponses of up to five seconds without any negative consequences, the functioning of the card service

system would not be thwarted by normal network delays between the cellphone’s UICC terminal and the remote card. Since this delay tolerance is probably the result of practical considerations that apply to all carriers, other cellular networks are likely to be equally forgiving.

Chapter 7

Conclusion

This chapter concludes the thesis by summarizing the key findings, drawing conclusions about their impact, pointing out the major contributions and identifying opportunities for future work.

Im Dokument Location Privacy in Mobile Networks (Seite 72-81)