• Keine Ergebnisse gefunden

Not the First Digit!: Using Benford's Law to Detect Fraudulent Scientific Data

N/A
N/A
Protected

Academic year: 2021

Aktie "Not the First Digit!: Using Benford's Law to Detect Fraudulent Scientific Data"

Copied!
8
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)Research Collection. Other Research Data. Not the First Digit! Using Benford's Law to Detect Fraudulent Scientific Data Author(s): Diekmann, Andreas Publication Date: 2007-05-16 Permanent Link: https://doi.org/10.3929/ethz-b-000310246. Rights / License: In Copyright - Non-Commercial Use Permitted. This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.. ETH Library.

(2) Appendix 1: Relative Frequencies of First and Higher Order Digits for 14 Subjects (Experiment 3). Falsified Regression Coefficients: First Digit. 123456789 Digit. 123456789 Digit. 123456789 Digit. Frequency 0 .1 .2 .3 .4. Subject 5 Chi2=5.60, n=26. Frequency 0 .1 .2 .3 .4. Subject 4 Chi2=7.90, n=100. Frequency 0 .1 .2 .3 .4. Subject 3 Chi2=9.08, n=100. Frequency 0 .1 .2 .3 .4. Subject 2 Chi2=14.14, n=100. Frequency 0 .1 .2 .3 .4. Subject 1 Chi2=18.49, n=100. 123456789 Digit. 123456789 Digit. 123456789 Digit. 123456789 Digit. 123456789 Digit. Frequency 0 .1 .2 .3 .4. Subject 10 Chi2=6.42, n=60. Frequency 0 .1 .2 .3 .4. Subject 9 Chi2=7.85, n=68. Frequency 0 .1 .2 .3 .4. Subject 8 Chi2=34.03, n=45. Frequency 0 .1 .2 .3 .4. Subject 7 Chi2=3.12, n=24. Frequency 0 .1 .2 .3 .4. Subject 6 Chi2=9.19, n=20. 123456789 Digit. 123456789 Digit. All Subjects Chi2=12.26, n=882. Frequency 0 .1 .2 .3 .4 123456789 Digit. 123456789 Digit. 123456789 Digit. Fabricated. Frequency 0 .1 .2 .3 .4. Subject 14 Chi2=5.49, n=80 Frequency 0 .1 .2 .3 .4. Subject 13 Chi2=19.39, n=50 Frequency 0 .1 .2 .3 .4. Subject 12 Chi2=13.64, n=46 Frequency 0 .1 .2 .3 .4. Subject 11 Chi2=9.13, n=63. 123456789 Digit. Benford 19. 123456789 Digit.

(3) Falsified Regression Coefficients: Second Digit Chi2=30.11, n=100. Chi2=23.88, n=100. Chi2=12.58, n=100. Chi2=30.15, n=99. Chi2=14.75, n=26. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. Frequency 0 .1 .2 .3 .4. Subject 5. Frequency 0 .1 .2 .3 .4. Subject 4. Frequency 0 .1 .2 .3 .4. Subject 3. Frequency 0 .1 .2 .3 .4. Subject 2. Frequency 0 .1 .2 .3 .4. Subject 1. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. Frequency 0 .1 .2 .3 .4. Subject 10 Chi2=17.26, n=60. Frequency 0 .1 .2 .3 .4. Subject 9 Chi2=42.69, n=68. Frequency 0 .1 .2 .3 .4. Subject 8 Chi2=17.09, n=45. Frequency 0 .1 .2 .3 .4. Subject 7 Chi2=17.16, n=24. Frequency 0 .1 .2 .3 .4. Subject 6 Chi2=9.44, n=20. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. Fabricated. Frequency 0 .1 .2 .3 .4. All Subjects Chi2=31.83, n=880. Frequency 0 .1 .2 .3 .4. Subject 14 Chi2=13.40, n=80. Frequency 0 .1 .2 .3 .4. Subject 13 Chi2=23.79, n=49. Frequency 0 .1 .2 .3 .4. Subject 12 Chi2=22.91, n=46. Frequency 0 .1 .2 .3 .4. Subject 11 Chi2=40.88, n=63. 0123456789 Digit. Benford 20. 0123456789 Digit.

(4) Falsified Regression Coefficients: Third Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. Frequency 0 .1 .2 .3 .4 .5. Subject 5 Chi2=11.72, n=26. Frequency 0 .1 .2 .3 .4 .5. Subject 4 Chi2=33.70, n=93. Frequency 0 .1 .2 .3 .4 .5. Subject 3 Chi2=19.59, n=100. Frequency 0 .1 .2 .3 .4 .5. Subject 2 Chi2=25.49, n=100. Frequency 0 .1 .2 .3 .4 .5. Subject 1 Chi2=32.35, n=99. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. Frequency 0 .1 .2 .3 .4 .5. Subject 10 Chi2=44.47, n=60. Frequency 0 .1 .2 .3 .4 .5. Subject 9 Chi2=40.91, n=68. Frequency 0 .1 .2 .3 .4 .5. Subject 8 Chi2=17.69, n=45. Frequency 0 .1 .2 .3 .4 .5. Subject 7 Chi2=57.31, n=24. Frequency 0 .1 .2 .3 .4 .5. Subject 6 Chi2=24.61, n=20. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. Fabricated. Frequency 0 .1 .2 .3 .4 .5. All Subjects Chi2=59.90, n=869. Frequency 0 .1 .2 .3 .4 .5. Subject 14 Chi2=22.48, n=79. Frequency 0 .1 .2 .3 .4 .5. Subject 13 Chi2=8.33 , n=47. Frequency 0 .1 .2 .3 .4 .5. Subject 12 Chi2=19.64, n=46. Frequency 0 .1 .2 .3 .4 .5. Subject 11 Chi2=113.22, n=62. 0123456789 Digit. Benford 21. 0123456789 Digit.

(5) Falsified Regression Coefficients: Fourth Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. Frequency 0 .1 .2 .3 .4 .5. Subject 5 Chi2=12.44, n=26. Frequency 0 .1 .2 .3 .4 .5. Subject 4 Chi2=35.83, n=85. Frequency 0 .1 .2 .3 .4 .5. Subject 3 Chi2=33.04, n=100. Frequency 0 .1 .2 .3 .4 .5. Subject 2 Chi2=19.29, n=100. Frequency 0 .1 .2 .3 .4 .5. Subject 1 Chi2=28.28, n=98. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. Frequency 0 .1 .2 .3 .4 .5. Subject 10 Chi2=103.07, n=56. Frequency 0 .1 .2 .3 .4 .5. Subject 9 Chi2=25.36, n=67. Frequency 0 .1 .2 .3 .4 .5. Subject 8 Chi2=38.68, n=45. Frequency 0 .1 .2 .3 .4 .5. Subject 7 Chi2=22.60, n=23. Frequency 0 .1 .2 .3 .4 .5. Subject 6 Chi2=18.05, n=20. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. 0123456789 Digit. Fabricated 22. Frequency 0 .1 .2 .3 .4 .5. All Subjects Chi2=112.74, n=833. Frequency 0 .1 .2 .3 .4 .5. Subject 14 Chi2=27.34, n=75. Frequency 0 .1 .2 .3 .4 .5. Subject 13 Chi2=29.32, n=42. Frequency 0 .1 .2 .3 .4 .5. Subject 12 Chi2=40.46, n=44. Frequency 0 .1 .2 .3 .4 .5. Subject 11 Chi2=162.69, n=52. 0123456789 Digit. Benford. 0123456789 Digit.

(6) Appendix 2: Questionnaire for the Fabrication Experiments 1 and 2* Name: Major: Semester: Your task is to construct a table of (unstandardized) regression coefficients (for a multiple linear regression) that support the following hypothesis: “The higher the unemployment benefits, the longer unemployment will last.” The values should be plausible and they should seem to you to have been produced by actual data analysis. A few more things to consider: 1. Keep in mind that a coefficient can be meaningfully interpreted only for a certain scale. If, for example, unemployment benefits are measured in Swiss francs, then you will have to select different coefficients depending on whether one unit of the unemployment benefits variable is equal to 100 francs or 1,000 francs. You should take the units of all the other variables into account in a similar way. First select a scale (by placing an x next to the option you choose) and then fill in the table with coefficients that you think would produce realistic results. 2. Be sure to put down a standard error as well as a coefficient. As you know, a coefficient with a probability of error of alpha = .05 is significant if the value of the coefficient is more than twice as large as the value of the standard error. Please denote significant coefficients with an asterix. 3. As you also know, the regression coefficient for a dichtotomous- 0/1 coded- variable denotes the amount by which the dependent variable changes when the independent variable is equal to 1 versus when it is equal to 0. For example, the coefficient for a variable that takes on the values of 1 for a city and 2 for a town or a rural area might be 3.642. If the length of the unemployment spell is measured in weeks, then the length of the unemployment spell in a city is 3.642 weeks shorter in a city than in a town or a rural area. 4. Be sure to note the coefficients and standard errors to four digits, not including the zeroes before the first digit. For example, the numbers 0. 001438 or 91.24 would both fulfill this condition.. *A slightly modified version of this questionnaire was used in experiment 3.. 23.

(7) So, let’s get started: First, select a scale for the length of the unemployment spell: Days: ........... Weeks: ........... Months: ............ Table: Determinants of the length of unemployment: Estimates from a multiple regression (standard errors in parentheses) Independent Variables. Regression Coefficients (Standard Errors) ...... (......). Unemployment benefits In units of CHF 1 ...... CHF 100 ...... CHF 1000 ...... Years of education. ...... (......). Years of job experience. ...... (......). Mother’s years of education. ...... (......). Father’s years of education. ...... (......). Sex (Female = 1). ...... (......). Marital status (married = 1 , otherwise 0). ...... (......). Last position was in the service sector (service sector = 1, otherwise 0). ...... (......). 24.

(8) Monthly income for the last job held, in units of CHF 1 ...... CHF 100 ...... CHF 1000 ....... ...... (......). Distance between residence and place of business in units of: 1 km ...... 10 km ....... ...... (......). Adjusted multiple R-squared. ....... Number of cases (N). ....... 25.

(9)

Referenzen

ÄHNLICHE DOKUMENTE

According to the amendment of the German university statistics law 2017 (Hochschulstatistikgesetzes), the Otto-von-Guericke University (OVGU) is obliged and entitled to collect

A particular challenge of the task is training data scarcity and the resulting overfitting of neural network methods, which we tackle with dropout, synthetic data augmentation

Symptom-Based Treatment of Neuropathic Pain in Spinal Cord- Injured Patients: A Randomized Crossover Clinical Trial.. Am J Phys

Figure A.9: Parametric versus non-parametric cumulative incidence estimates, by starting state (rows), next event (colours), month (columns), hospital bed capacity (rows) and

In summary, SIMD performs best for operators that do the whole work using SIMD with little or no amount of scalar code.. Furthermore, a clever data layout is necessary to ex- ploit

Meanwhile, problem of data sharing from multiple locations has raised significant attention since the actual data is stored in users' data sources at peers' location

[r]

[r]