Wolf-Tilo Balke
Jan-Christoph Kalo
Institut für Informationssysteme
Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
Relational Database Systems 2
12. Security
12.1 Security in Databases 12.2 Access Control
12.3 SQL Injections 12.4 Anonymization
12 Security
• Database security comprises a set of measures, policies, and mechanisms
– To provide secrecy, integrity, and availability of data – To combat threats to the system, both malicious and
accidental
• Secrecy (or confidentiality)
– Protection of data from unauthorized disclosure
• Integrity
– Only authorized users should be allowed to modify data
• Availability
– Making data available to authorized users and application programs
12.1 Security in Databases
– “[…] the many public and painful disclosures,
especially security breaches that have dramatically
affected brand image and the financial health of many public companies. IT risk, specifically data security, has truly become a board-level discussion.”
• AMR Research: “Governance, Risk and
Compliance Spending Report 2008-2009”, 2008
– “21% of enterprises are worried about a decline in stock price [resulting from a security breach]”
• Forrester Research: “Aligning Data Protection Priorities With Risks”, April 2006
12.1 Security in Databases
• Database design must consider
– The possible attacks and vulnerabilities – The risks to which the data is exposed
• The protection, which security gives, is usually directed against two classes of users
– Stop users without database access from having any form of access
– Stop users with database access from performing actions on the database which are not required to perform their duties
12.1 Security in Databases
– “The most prevalent attack style, responsible for 39%
of data thefts, was authorized users exploiting their privileges.“
• Forrester Research: “Aligning Data Protection Priorities With Risks”, April 2006
– “According to the 2007 Annual Study: Cost of a Data Breach: Data breach incidents cost companies $197
per compromised customer record in 2007, compared to $182 in 2006.“
• Ponemon Press: “Ponemon Study Shows Data Breach Costs Continue to Rise”, November 2007
12.1 Security in Databases
• "You have zero privacy anyway.
Get over it!”
– Scott McNealy (Jan, 1999)
Chairman and Co-Founder Sun Microsystems, Inc
12.1 What to do?
• Social Security Number Data Theft at University of Texas, Austin
– Chronology
• Mar 02, 2003: Initial observation of
high-volume database access from off-campus
• Mar 03, 2003: Law enforcement contacted
• Mar 04, 2003: Evidence points to UT student
• Mar 05, 2003: Two residences searched: Austin, Houston
• Mar 05, 2003: Austin American-Statesman breaks story
• Mar 14, 2003: UT undergraduate student charged
• Sep 06, 2005: The student was sentenced to five years
probation and ordered to pay $170,056 restitution for accessing protected computers without authorization, and possession of stolen social security numbers (misuse of the numbers could not be proven)
12.1 What to do?
• April 2011: Sony’s PlayStation Network servers were attacked
• 77 million customers’ personal information were stolen
– Names, email-addresses, logins, passwords, birthdates,…
• Passwords in plaintext or weakly hashed
• 12.3 million credit card details were stolen
– Credit card number, name
– Fortunately, card security codes were not stolen
• Credit card numbers not encrypted
Sony - Make Believe
• Motive unclear
– Maybe because of Sony's legal pursuit of hacker George Hotz who had modified the firmware of a PlayStation 3 so that it could run the Linux OS
• Procedure
– Hackers anonymously rent a server, using stolen credit card information
– Exploited known security gaps of an old Apache version used by the login portal
Sony - Make Believe
• Consequences
– Playstation network was down for several weeks – Lost of business confidence
– Lawsuits against Sony
– Stock prices declined drastically
– Supposed financial damage: 24 Billion $
Sony - Make Believe
• Lessons learned
– Update software
• Don’t run outdated servers with known vulnerabilities
– Minimize damage for the case that data is stolen
• Hash passwords using strong algorithms
– Weak hashes easily reversible, even for SHA1 – Use strong hashes and salts
• Use encryption for sensible data
– Store encryption keys separately
• Statistical database security
Sony - Make Believe
• Restrict access to physical location of data
– Administrative and external control measures to prevent access to physical resources
• Rooms, storage facilities, terminals,…
– Does not prevent misuse by authorized personnel
• Access restrictions are very difficult to uphold in case of Web-accessible databases
12.1 Basic Measures
• Data encryption
– Often, it is hard to prevent people from copying the database and then hacking into the copy at another location
– It is easier to simply make copying the data a useless activity by encrypting the data
• Authentication
– Verify the user’s identity before allowing access by something the user is acquainted with or physical characteristics of the user
• Passwords, codes, fingerprints, signature,…
12.1 OS/DBMS-level Protection
• Audit trails
– If someone does penetrate the DBMS, it is useful to find out how they did it and what was accessed or
altered
– Audit trails can be set up selectively to minimize disk usage, identify system weaknesses, and finger malicious users
• Logging phase: all request and respective results are logged for each user
• Reporting phase: collected information in the log is checked to detect possible violations or attacks
• Trails can even detect violation attempts executed through sequences of queries
12.1 Auditing Mechanisms
• Access control (authorization) ensures that all direct accesses to database objects occur exclusively according to the modes and rules given by security policies
12.2 Access Control
access request
DBMS
authorization system
data
other DBMS components control
procedures access rules
security policies access
permitted/
denied
• Access control policies specify if and how users can access each database object
– In closed systems, only explicitly authorized accesses are allowed
– In open systems, all accesses that are not explicitly forbidden are allowed
– In multi-level protection, system access is defined using several classification levels to allow/limit access
• Data can e.g., be unclassified, confidential, secret, top secret, etc. and users are assigned a certain security clearance
• The policies also specify if and how access rights can be transferred
12.2 Access Control Policies
• Besides authentication, access control may also include access limitation
– Minimum privilege policy
• All users should access only the minimum quantity of information needed for their activity
• Sometimes this is hard to predict and overly restrictive
• ‘Need to know’ policy
– Maximum privilege policy
• All data of a certain type can be accessed, thus the sharing is maximized
• ‘Maximum availability’ policy
12.2 Access Control Policies
• The granularity in specifying access control in the database can be
– The entire database – A set of relations
– An individual relation
– A set of records in a relation – An individual record
– A set of attributes of all records
– An attribute of an individual record
12.2 Access Control Policies
• Restricting the granularity is usually
performed by creating specific views containing only the data that should be visible
– CREATE VIEW addresses AS SELECT name, address
FROM employee
WHERE department = ‘finance’
– Access to this view means vertically and horizontally restricted access on the
employee table
12.2 Access Control Policies
• Access can be granted to individual users, groups, application programs, etc.
• The administration of access control policies and access rights can either be
– Centralized, where all rights are controlled by the DBA – Decentralized, where different DBAs are responsible for
different database instances
– Cooperative, where a predefined group of users has to agree on granted access
– Based on ownership, where the creator of a database object as default owner can control the respective access rights
12.2 Access Control Policies
• In most commercial DBMS, there is a two-layer approach to naming relations
– The DBMS has several database instances, for which DBA has permission to create and delete databases, and to grant users access to databases – Each database is a flat name space: users with the
necessary permission can create tables and views in a database.
12.2 Access Control Policies
• Because it is a flat name space, all table names must be unique within a database
– The database login name is often taken as the username
– Table and view names are prepended with the name of the user, who created it
12.2 Access Control Policies
• Discretionary Access Control
– Grants privileges to users, including the capability to access specific data files, records, or fields in a specific mode
• Mandatory Access Control
– Classifies users and data into multiple levels of security, and then enforces appropriate rules
• Role-based Access Control
– Access privileges are associated with the role of the person in the organization
12.2 Types of Access Control
• Discretionary policies require that for each
user, authorization rules are defined specifying the privileges owned on the database objects
– Access requests are checked against the granted privileges
– Discretionary means that the possibility for users to grant/revoke rights exists (usually
based on ownership)
– Granted access privileges can be propagated through the system
12.2 Discretionary Access Control
• The SQL GRANT/REVOKE statement can be used to grant privileges to users
– GRANT privileges ON table(s)/column(s) TO grantees
[WITH GRANT OPTION ]
– REVOKE privileges ON table(s)/column(s) FROM grantees
• Possible privileges are:
– SELECT - user can retrieve data
– UPDATE - user can modify existing data – DELETE - user can remove data
– INSERT - user can insert new data
– REFERENCES - user can make references to the table
12.2 Discretionary Access Control
• The WITH GRANT OPTION permits the propagation of rights to other users
– Allows other users to look after permissions for certain tables
• E.g., allowing a manager to control access to a table for their subordinates
• The list of grantees does not need to be (a set of) usernames
– It is permitted to specify PUBLIC, which means that the privileges are granted to everyone
12.2 Discretionary Access Control
• Checking discretionary access control is often implemented by an authorization matrix
– The rows represent users – The columns represent the
database objects
– The fields contain the respective privileges
• Similar concept in file security
12.2 Discretionary Access Control
• The authorization matrix model can be extended by predicates that have to be satisfied in order to use the authorization
– Data-dependent: e.g., constraints on the values of the accessed data (access only employee records where salary
< 100,000)
– Time-dependent: authorized access only between 9:00 am and 5:00 pm
– Context-dependent: e.g., a user might have read rights on individual columns, but not on joins between them
– History-dependent: constraints dependent on previously performed accesses
12.2 Discretionary Access Control
• Problem: revocation of propagated privileges
– Access to data might be needed only for a limited period
• Solution: temporarily grant some privileges to a user
– In SQL, a REVOKE command is included to cancel privileges
– If a privilege is granted with GRANT option to an account, this account can also grant that privilege on the relation to other accounts
12.2 Problems
• Suppose that B is given the GRANT OPTION by A and that B then grants the privilege on R to a third account C, also with GRANT OPTION
12.2 Problems
R
… data owner
read R read R
A B
C
• Privileges on R can propagate to other accounts without the knowledge of the owner of R!
12.2 Problems
R
… data owner
read R read R
read R
• If the owner now revokes the privilege granted to B, all the propagated privileges should automatically be revoked by the system
12.2 Problems
R
… data owner
revoke read R
revoke read R
revoke read R
• If a user received a privilege from two or more sources, the user will continue to have the privilege until all the sources revoke the privilege
12.2 Problems
R
… data owner
read R revoke
read R read R
read R
• Problem: the flow of information from some database object into a less secure database object
– Discretionary access models do not impose any restriction on the usage once data has been obtained by a user
– The dissemination of data is not controlled
• Users with a read privilege can copy read data to their own table, on which they
have full rights
• Maliciousness within the system can occur via Trojan horses
12.2 Problems
• Consider a malicious user having only a privilege to create tables in a database
12.2 Problems
database corrupted
application R
…
R’
… grant
read on R
read R
read R
write to R’
create R’
and read
• A solution to this problems are so-called flow controls that regulate the distribution of
information among accessible objects
– A flow between two database objects A and B occurs when a statement reads from A and writes into B
– Flow controls check that information contained in some objects does not flow explicitly (by copy) or implicitly (via intermediate objects) into less
protected objects
• Otherwise a user might get something from the less
protected object that he/she would not have gotten from the original object
12.2 Problems
• Mandatory Access Control maps objects onto a classification of the respective sensitivity
– All system data must be classified, users are assigned a certain clearance level by some central authority – Access to data is determined by
a mandatory policy through the comparison of requester level
and item level
• Most prominent example is the Bell-LaPadula model (1973) to formalize the U.S. Department of Defense multilevel security policy
12.2 Mandatory Access Control
• Secrecy is expressed as a set of rules (axioms) that must always be satisfied
• The control is based on security levels for each database item (object) and clearances for users (subjects) consisting of
– A classification from an ordered set
• E.g., top secret, secret, confidential, unclassified
– A set of categories from a non-hierarchical set
• E.g., administration, finance, human resources, etc.
12.2 Mandatory Access Control
• The set of security levels thus forms a lattice
– The lattice is partially ordered according to a dominance relationship
– A security level (class
1, {cat
1,…,cat
n}) dominates a security level (class
2, {cat
1,…,cat
m}) if and only if
class
1≥ class
2and {cat
1,…,cat
n} {cat
1,…,cat
m} – E.g.,
12.2 Mandatory Access Control
secret, {finance, marketing}
confidential, {finance} confidential, {marketing}
unclassified, {finance}
top secret, {administration}
unclassified, {administration}
• Subjects are active elements of the system
– As in the discretionary case, object owners can grant/revoke privileges to/from subjects
• Privileges are stored in an access matrix
– Subjects can execute actions (read, write, update,…) only with respect to the subject‘s clearance and the object‘s security level
– When entering the system, each subject logs on with a certain
current level where always
current level ≤ clearance holds
12.2 Mandatory Access Control
• The secrecy is maintained if three axioms are satisfied
– Simple security property: Any subject may have
• read or write access to an object only if the clearance of the subject dominates the security level of the object
– *-property: An untrusted subject may have
• Append (insert) access on an object only if the security level of the object dominates the current level of the subject
• Write access on an object, only if the object’s security level and the subject’s current level are equal
• Read access on an object, only if the security level of the object is dominated by the current level of the subject
12.2 Mandatory Access Control
– Discretionary security property:
• Every current access must be present in the access matrix, i.e., a subject can only perform accesses it is actually authorized for
• Moreover, security classifications cannot simply be changed
– Tranquility principle:
• No subject can modify the
classification of an active object
12.2 Mandatory Access Control
• The secrecy is maintained by the simple security property (no read-up)
– An object with higher security level can neither be read nor modified (except for appending data)
• The (star) property (no write-down) enforces a simple flow control
– Although lower security objects can be read, their data cannot be written to any object that has a level lower than the current level
• This prevents Trojan horse attacks
– Information transfer only possible via trusted subjects
• Trusted subjects do not need to obey property
12.2 Mandatory Access Control
• The Bell-LaPadula model succeeds in achieving secrecy, but cannot protect a system from
unauthorized modifications of information
– A similar principle like for data secrecy can also be applied for data integrity (e.g., the Biba model (1977))
• There are also several models combining both data secrecy and integrity
– The Dion model (1981) basically combines the principles of controlling secrecy of the Bell-LaPadula model with the principles of strict integrity of the Biba model
– The SeaView security model (1990) adapted the policies specifically for use in relational databases
12.2 Mandatory Access Control
• The advantages of mandatory models derive from their suitability to environments, where user and objects can be classified
– ‘Mandatory’ implies that systems should be able to
enforce an access control policy that is mandated by some regulation that must be absolutely enforced
• e.g., in 1995, US President Bill Clinton signed Executive Order 12958 which created new standards for the process of classifying government documents
• However, they often are overly strict
12.2 Mandatory Access Control
• Organizations often rely on role-based access control
– Each role is created by the administrator
– The permissions to perform certain operations are assigned to specific roles
– Each user is granted/revoked roles
• Role-based access control differs from traditional access control systems
– It assigns permissions to specific operations with meaning in the organization, rather than to low- level data objects
12.2 Role-Based Access Control
• An example operation might be to create a 'credit account' transaction in a financial application and assign it to the role of ‘bank clerk’
– The assignment of permission to perform a particular operation is meaningful because the operations are fine- grained and have meaning within the application – In contrast, traditional access control is
used to grant or deny write access to a particular system file, but it cannot say in what ways that file could be changed
12.2 Role-Based Access Control
• SQL injection is a security vulnerability of an application using an SQL database
• Characteristic
– user input is directly embedded into an SQL statement without further checking – user can extend the SQL statement
or even inject completely new ones – thus, data may be corrupted,
deleted, or stolen
12.3 SQL Injection
• Example scenario
– A Web interface asking for
a username and a password.
– The following statement is used to authenticate the user:
– the application simply inserts the user input into the SQL string (using string concatenation)
– if there is the given username/password combination, the application proceeds to the protected member area
12.3 SQL Injection
String s = "SELECT * FROM users " +
"WHERE username = '" + user + "'" +
"AND password = '" + passwd + "';"
• Possible attacks
– authenticate as admin
• username = admin
• password = ' OR 1=1
– drop the user table
• username = admin
• password = '; DROP TABLE users; --
12.3 SQL Injection
SQL comment
SELECT * FROM users WHERE username = 'admin' AND password = '' OR 1=1;
SELECT * FROM users WHERE username = 'admin' AND password = ''; DROB TABLE users; --';
– even worse – capture the whole system!
• some DBMS systems provide stored procedures to access the underlying operating system itself (e.g., MS SQL)
• '; EXEC xp_cmdshell 'format c: /s';
12.3 SQL Injection
• What hackers usually do
– hackers usually don’t know the queries, tables, and inner workings of applications
• vulnerabilities need to be discovered
– start with entering information containing any SQL control characters (e.g., ')
• if this results into an error, the application is potentially prone to injection attacks
– inject SQL code in order to guess the structure of the tables and columns, and the security boundaries of the system
• observe the error codes to validate your guesses
– as soon as the extent of the vulnerability data schema is known, data can be freely manipulated or stolen
12.3 SQL Injection
• How to prevent injection attacks?
• Sanitize the input!
– restrict all user input to only safe characters (i.e., remove control characters)
– will also delete characters which might be needed in the input (e.g., ')
– won’t protect you in case of integer values
• … WHERE id = 17 OR 1 = 1
12.3 SQL Injection
• Quote and escape the input
– escape all control characters
• this might be quite tricky and often depends on the DBMS
– e.g. backslash is not a special character in DB2 but in MySQL it is used as default escape character
• most database APIs provide special functions for quoting and escaping
– e.g. mysql_real_escape_string() in PHP
– example:
input: \'; DROP TABLE users; -- escaped:
WHERE email = '\\\'; DROP TABLE users; --'
– Notice: for DB2 this would not work:
WHERE email = '\\\'; DROP TABLE users; --'
• dedicated escape procedures for each DBMS are needed
12.3 SQL Injection
• Use strongly typed parameters
– cast/parse each user input to its intended data type
• prevents e.g. integer input with injected code
• together with sanitized input or escaping and quoting, typing provides an acceptable minimum amount of protection
12.3 SQL Injection
String s = readUserInput();
try {
Integer.parseInt(s);
} catch (NumberFormatException ex) { // respond to invalid input
}
• Use prepared statements
– the structure of a prepared statement is fixed
• user input is just data and cannot change the predefined statement structure
– simplest and most secure way to sanitize your input
– besides the security benefit, prepared statements may also increase your query performance
– BEST SOLUTION – USE PREPARED STATEMENTS!
• If you do not use prepared statements in an application, have a good reason for that!
12.3 SQL Injection
σpassword=%1
equal
password %1
• Isolate your Web/DB server
– put your servers in a secure DMZ (DeMilitarized Zone)
• even if the attacker can completely capture the machine, he/she won’t be able to do much harm
• Restrict your error reporting
– many programming frameworks are by default configured into developer mode
– on failure, they report in detail what went wrong
• e.g. display the faulty query and excerpts from the call hierarchy or the DB schema
• this information is very helpful in finding security vulnerabilities, so don’t give it to your foes!
12.3 SQL Injection
• „Your data is safe with us” – The tale of the anonymous dataset
– Example: The life of AOL user #4417749
• Setting: AOL Search
– One of the major web search and content portals – AOL serves millions of searches per day
12.4 Anonymization
• AOL has a privacy policy promising they won’t publish your identify
• However, internally records are kept of all user searches
– Search records are very valuable for improving algorithms
• On 4
thAugust 2006, an
anonymous dataset was published for free use by the IR research community
– Contained searches of 650,000 users over a 3-month period
12.4 Anonymization
• Data set contained
– Anonymous user id
• Just an incrementing number
– Query text
• As the user typed it
– Query time and date – Result rank
• Rank of the result the user clicked on
– Result URL
• AOL acted on clear consciences to help out free search algorithm research
– But…
12.4 Anonymization
• The data set spread very fast
• Unfortunately, anonymizing data is not that easy
– New York Times, among others, reconstructed individual user’s identities and personal profiles – They cross-matched all records and combined
them with public available sources
• Phonebooks, Business Directories, Classified Ads, …
12.4 Anonymization
• Most prominent example: User #4417749
– Thelma Arnold, 62-year-old, widowed, lives in Lilburn, Georgia
– Is looking for a new partner in his 60s
– Has at least one dog randomly pissing on furniture – Has problem with trembling fingers and aches in her
back
– Is worried about the safety of her neighborhood
– Wonders about problems of the world, like hunger in Africa or children in war-torn Iraq
12.4 Anonymization
• AOL immediately removed the dataset
– But still around on various mirrors and databases
• “Browse others AOL data – hours of fun guaranteed”
• In September 2006, a class action lawsuit was filed
– Case still running as of autumn 2010
– Seeks at least $5,000 for each person involved
• 3.250 Billion Dollars!
• What to learn?
– Proper data anonymization IS very important!
12.4 Anonymization
• Anonymization: Typical (Bad) Cases
– Removal of personal identifiers – Safe?
12.4 Anonymization
Name Age Sex Zip
Karl 19 M 38114
Anna 21 F 30167
Otto 33 M 38005
Public Data
Age Sex Zip Disease Cure
19 M 38114 Hepatitis Yes
21 F 30167 Hepatitis Yes
33 M 38005 Aids No
“Anonymous” Hospital Data Real Identity – No matching should be possible
• Anonymization: Typical (Bad) Cases
– Removing data details – Safe??
12.4 Anonymization
Name Age Sex Zip
Karl 19 M 38114
Anna 21 F 30167
Otto 33 M 38005
Public Data
Age Sex Zip Disease Cure
18-20 M 381* Hepatitis Yes
21-25 F 301* Hepatitis Yes
30-35 M 380* Aids No
“Anonymous” Hospital Data
• How to protect private content, but preserve useful context?
– Compromise between encryption and plain data sharing – Algorithmic techniques to separate content & context
• With proliferation of data collection devices, privacy is disappearing
– Scale
– Insider vs. outsider protection
– Some data mining is useful, others are harmful – e.g., AOL search trace release
12.4 Anonymization
• Approaches for Privacy Preservation
• Fight Data Mining Approaches
– Modify data in such a way that certain rules cannot be inferred
• Cryptographic / probabilistic approaches
– Query responses just give probabilistic results
– Multiple public keys for a single user allow aggregation of data only in certain cases
• Statistical Approaches
12.4 Anonymization
• Example Idea: Slice data into tiny content blocks
– Statistic approach
– Reconstructing data computationally hard
• Data analysis still possible
– Frequency statistics
• Word frequencies of the UN Charta