• Keine Ergebnisse gefunden

Donnerstag, 26. März 2009

N/A
N/A
Protected

Academic year: 2022

Aktie "Donnerstag, 26. März 2009"

Copied!
27
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)Donnerstag, 26. März 2009.

(2) Trace Me If You Can Studying the Effectivity of Various Data Retention Schemes for Single-Hop Proxy Servers. 2009-03-24. Dominik Herrmann University of Regensburg. Rolf Wendolsky JonDos GmbH. Donnerstag, 26. März 2009. joint work; Preliminary study in order to bringconcrete numbers into discussion on data retention.

(3) DATA RETENTION. DIRECTIVE. 2006 Donnerstag, 26. März 2009. EU issued Data Retention Directive. Member states had to implement it within 18 months, Germany and Austria postponed implementation until 2009..

(4) TRACEABILITY. Donnerstag, 26. März 2009. DR is all about Traceability. concentrate on Web traffic. traceability means linking offending HTTP requests to originating user via his IP. Traceability is easy for direct requests as ISPs are now required by law to store the IP-user mapping for at least 6 months. LEAs take source IP of observed packet and request contact info from the ISP..

(5) Proxy Servers and Anonymisers Src IP #1. Src IP #2. Src IP #1. Src IP #3. make traceability difficult (some) are subject to data retention obligations Proxy’s IP. Proxy’s IP. Proxy’s IP. Proxy’s IP. Donnerstag, 26. März 2009. It is much more difficult when the user uses a proxy server. like a funnel, proxies substitute the origin source IP of all requests with their own; are required by law to support backtracking users.

(6) How to do Data Retention on Proxies?. The law does not tell us!. Donnerstag, 26. März 2009. There are lots of ideas and we can borrow from research on anonymisation services, key words are log file pseudonymisation, intersection attacks, and so on..

(7) Long-Term Research Question find a data retention scheme for proxy servers which honours privacy of users + allows for optimum traceability of offenders. Donnerstag, 26. März 2009.

(8) Goal of Preliminary Empirical Study assess effectivity of 4 data retention schemes (which utilise data already available today to proxy providers). no new technology required (i.e., cheaply+easily implemented) intersection attacks neglected (for now). Donnerstag, 26. März 2009. wanted to find out whether information already available to providers of proxies or anonymisers is sufficient to do effective data retention..

(9) Simplistic Effectivity Metric # of successfully traceable requests # of all requests ratio of requests which could have been attributed to their true source IP unambiguously. Donnerstag, 26. März 2009. For this preliminary study we concentrated on one simplistic metric to measure the effectivity of a data retention scheme. Curious to learn about your ideas regarding metrics..

(10) Characteristics of Sample. Squid Log of a Local School 1,100 unique users 6 months 9 mn requests 126 Source IPs 33k Destination IPs 51k Destination Host Names. Donnerstag, 26. März 2009. For the study we pulled the proxy log files of a local school. In order to make sure that the sample was not biased too badly we analysed it descriptively....

(11) Popularity of Requested Sites follows Power Function (Zipf-like). Relative frequency. 0.1. 0.01. 0.001. 0.0001. 1e-05 1. 10. 100 1000 Rank of site. 10000 100000. Donnerstag, 26. März 2009. and found that access frequencies of sites ranked by popularity show the expected Zipf-like distribution. I.e., users in our group have - to a certain degree - shared interests..

(12) Heterogenuous Usage Intensity. 0.04. Relative frequency. 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0. 20. 40 60 80 100 Rank of Source IP. 120. 140. Donnerstag, 26. März 2009. In order to get a better feeling for our users we looked at the number of requests from the various SRC IPs. we found that activity varies wildly..

(13) Biased Sample! YMMV (calls for follow-up study). Donnerstag, 26. März 2009. We are well aware that our sample is biased due to the environment we pulled it from. Therefore, results are only valid for our user base..

(14) Evaluation Methodology For each request in the sample simulate a typical Law Enforcement Agency query and calculate the Simplistic Effectivity Metric. Query:. Donnerstag, 26. März 2009. From which source IP address originated the request at <TIMESTAMP> to <URL> using your IP address <IP>?.

(15) Evaluation of 4 Data Retention Schemes. Sessions. Requests + DST IP +DST Host. Donnerstag, 26. März 2009.

(16) Session-Based Logging. available for VPNs, anonymisation services, etc.. Timestamp of start of user session Timestamp of end of user session Source IP Proxy IP Sessions Donnerstag, 26. März 2009.

(17) Ratio of unambigiously identifiable sites. Session-Based Logging 1 0.8 0.6 0.4 0.2 0 0. 100. 200 300 400 Session duration [s]. 500. 600. Sessions Donnerstag, 26. März 2009.

(18) Request-Based Logging. available for HTTP proxy servers, etc.. Timestamp of request Source IP Proxy IP Requests Donnerstag, 26. März 2009.

(19) Ratio of unambigiously identifiable sites. Request-Based Logging 1 0.8 0.6 0.4 0.2 0 0. 50 100 150 Available timestamp accuracy [s]. 200. Requests Donnerstag, 26. März 2009.

(20) Request-Based Logging + Storing Destination Address. Timestamp of request Source IP Proxy IP Destination IP or hostname. Donnerstag, 26. März 2009.

(21) Ratio of unambigiously identifiable sites. Request-Based Logging + Storing Destination IP 1 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.9 0. 50 100 150 Available timestamp accuracy [s]. 200. + DST IP Donnerstag, 26. März 2009.

(22) Ratio of unambigiously identifiable sites. Request-Based Logging + Storing Destination Hostname 1 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.9 0. 50 100 150 Available timestamp accuracy [s]. 200. + DST Host Donnerstag, 26. März 2009.

(23) Results Overview Sessions. Requests. + DST IP. +DST Host. sessions of 300s: 5% traceable. accuracy 60s: 8%. 60s: 95.8%. 60s: 96.3%. privacy: poor. privacy: poor. accuracy 1s: 39% privacy: good. Donnerstag, 26. März 2009. privacy: okay.

(24) Open Questions How does homogeneity of users influence effectivity? What accuracy is achievable for timestamps in real world? How effective are intersection attacks in the real world? How would privacy benefit if proxies used huge IPv6 ranges? What about advanced schemes, e.g., embedding dedicated data retention tags in HTTP header or using TCP source ports?. Donnerstag, 26. März 2009.

(25) Traceability. Donnerstag, 26. März 2009. Real challenge is to find a data retention scheme that combines.

(26) Privacy. Donnerstag, 26. März 2009. cannot be solved by the 4 schemes we evaluated. Search goes on....

(27) Trace Me If You Can studied 4 data retention schemes based on already available data using log files from a small proxy server results indicate that the schemes based on session-based and request-based logging offer no satisfactory traceability traceability will improve significantly, if destination IPs are stored; which comes at the cost of privacy of users. Dominik Herrmann dominik.herrmann@wiwi.uni-regensburg.de Donnerstag, 26. März 2009. http://www-sec.uni-regensburg.de/herrmann/.

(28)

Referenzen

ÄHNLICHE DOKUMENTE

The results of correlation analysis suggested that an increase in frontal theta along with a decrease in posterior alpha correlated with an increase in motor imagery

Thus, on their most basic level, both, present and future spatial planning activities strongly rely on so-called Database Management Systems (DBMS).. These are complex computer

In this section, through long-time numerical simu- lations, we illustrate the performance of the FVEG method for the linear wave equation system, see [10] for its approximate

• In a concrete instantiation we show that our LaPS scheme can be directly applied to our SOMAR architecture by extending it with a lattice-based homomorphic aggregate signature

The most extensive preprocessing step was the traffic status estimation which estimated traffic density based on input data coupled with some external data such as speed

One-dimensional fault time vector (Noi) and additional vector of censored operation times (Nci) are plotted based on received data of electric motors health.. Each next fault time

In this study, the computation time and prediction performance of the two-step IPF-Lasso has been compared to the original Lasso, separate Lasso models for each modality, sparse

experiment; caused by a severe computer problem, pictures are not true to scale.. In addition a description sheet for EM is attached, with scematic schemes from and based on Agerer