DATA BASICS
12 NOV 2018 I SABINE SCHRÖDER, IEK-8
KINDS OF DATA
Sabine Schröder Kinds of data
OUTLINE
12 November 2018
1. Motivation
2. Data values and operations 3. Array data structures
4. Metadata
5. Digital data formats: human readable vs. binary
6. Example: geo-scientific data and coordinate systems
7. Summary/Outlook
MOTIVATION
Sabine Schröder Kinds of data
DATA VALUES AND OPERATIONS
12 November 2018
?
DATA VALUES AND OPERATIONS
?
Sabine Schröder Kinds of data
DATA VALUES AND OPERATIONS
12 November 2018
?
DATA VALUES AND OPERATIONS
?
Sabine Schröder Kinds of data
DATA VALUES AND OPERATIONS
12 November 2018
?
quantitativ
DATA VALUES AND OPERATIONS
?
, , , …
quantitativ
Sabine Schröder Kinds of data
1 2 3 4
DATA VALUES AND OPERATIONS
12 November 2018
?
, , , …
quantitativ
1 2 3 4
4 3 2 1
DATA VALUES AND OPERATIONS
?
, , , …
quantitativ
Sabine Schröder Kinds of data
3 31 17 19
1 2 3 4
4 3 2 1
DATA VALUES AND OPERATIONS
12 November 2018
?
, , , …
quantitativ
3 31 17 19
1 2 3 4
4 3 2 1
DATA VALUES AND OPERATIONS
?
, , , qualitativ …
quantitativ
Sabine Schröder Kinds of data
3 31 17 19
1 2 3 4
4 3 2 1
DATA VALUES AND OPERATIONS
12 November 2018
?
, , , qualitativ …
quantitativ
3 31 17 19
1 2 3 4
4 3 2 1
DATA VALUES AND OPERATIONS
?
, , , qualitativ …
quantitativ
Sabine Schröder Kinds of data
3 31 17 19
1 2 3 4
4 3 2 1
DATA VALUES AND OPERATIONS
12 November 2018
?
, , , qualitativ …
quantitativ ordinal
DATA VALUES AND OPERATIONS
Sabine Schröder Kinds of data
DATA VALUES AND OPERATIONS
12 November 2018
Type: positive integer Range: 0.. ∞
Plausible operations: +(1)
Type: integer
Range: −∞ .. ∞
Plausible operations: +, -,
=, !=, <, >
DATA VALUES AND OPERATIONS
Sabine Schröder Kinds of data
DATA VALUES AND OPERATIONS
DIGITAL REPRESENTATION
12 November 2018
Elementary/primitive data types:
bit/logical/boolean byte
int/integer (short,int,long,INTEGER*
- unsigned)
float/real (real,double,REAL*) complex
char/character pointer
Excursus: floating point numbers
• represented with sign, mantissa, exponent +3.241592E-27
• not all real numbers are representable
(rounding)
DATA VALUES AND OPERATIONS
DIGITAL REPRESENTATION
non-primitive data types
• enumerations
• structures
composition/union elements of different data types
collection of elements of the same data types arrays
(special case: string)
Sabine Schröder Kinds of data
ARRAY DATA STRUCTURES
12 November 2018
…
…
…
+ 5D
, , , ...
ARRAY DATA STRUCTURES
Address arithmetics:
𝑎𝑑𝑑𝑟𝑒𝑠𝑠 𝐴𝑖,𝑗 = 𝐵 + (𝑖 ∗ 𝑛𝑐𝑜𝑙 + 𝑗) ∗ 𝑙 B: foundation address of array in memory
i: index of row
Sabine Schröder Kinds of data
ARRAY DATA STRUCTURES
12 November 2018
Main Memory Secondary Memory
Swapping Out
Swapping In
inefficient access:
1. A[0][0]
2. A[1][0]
3. A[2][0]
…
better:
1. A[0][0]
2. A[0][1]
3. A[0][2]
…
A(1000,1000,1000)
inefficient: 28.45 s
efficient: 2.85 s
ARRAY DATA STRUCTURES (EXCURSUS)
Sabine Schröder Kinds of data
ARRAY DATA STRUCTURES (EXCURSUS)
12 November 2018
11
METADATA
information that describes other data
Sabine Schröder Kinds of data
METADATA
12 November 2018
information that describes other data Types of metadata:
• descriptive metadata: standard_name, unit, parameter_measurement_method
• administrative metadata: creation_date, modification_date
• structural metadata: relationships Answers questions about data like
• What?
• When?
• Where?
• Who?
• How?
• Which?
• (Why?)
METADATA
information that describes other data Types of metadata:
• descriptive metadata: standard_name, unit, parameter_measurement_method
• administrative metadata: creation_date, modification_date
• structural metadata: relationships Answers questions about data like
• What?
• When?
• Where?
Sabine Schröder Kinds of data
encrypting should not be intuitively readable encrypting is easy and NOT human readable electronic processing needs conversion ( parser )
automatic compression usually leads to binary format
fast processing (but: byte - ordering ) can be compressed every character ( digit ) consumes one byte compact
disk space
band width
tools might not be for free or not available for the system
every system has a free simple editor
further information (at least about the format ) needed
DIGITAL DATA FORMATS
12 November 2018
Human Readable Text Machine Readable Binary
readable not understandable without aids
intuitively usable not usable without tools or programming
rarely further information needed
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE
SYSTEMS
Sabine Schröder Kinds of data
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
12 November 2018
Horizontal coordinate systems:
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
Horizontal coordinate systems:
nx=900, ny=451
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
Sabine Schröder Kinds of data
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
12 November 2018
Horizontal coordinate systems:
nx=900, ny=451
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
latitudes(ny) = -90, -89.6, -89.2, -88.8, ...
88.4, 88.8, 89.2, 89.6, 90 [degrees_north]
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
Horizontal coordinate systems:
nx=900, ny=451
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
ncells=1310720
center_longitudes(ncells) = -3.141593, … 3.141593
[radian]
center_latitudes(ncells) = -1.568645, ...
1.568645 [radian]
nv=3
longitude_vertices(ncells, nv) = -1.570796, … 1.570796
Sabine Schröder Kinds of data
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
12 November 2018
Vertical coordinate systems:
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
Vertical coordinate systems:
nlevel=42
level(nlevel) = 1, 3, 14, 25, 36, 47, … 84952, 95898, 100369
[Pa]
Sabine Schröder Kinds of data
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
12 November 2018
Vertical coordinate systems:
𝜎 = 𝑝
𝑝
𝑠EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
Vertical coordinate systems:
formula: sigma(n,k,j,i) = p(k)/ps(n,j,i) nx=900 [index:i], ny=451 [index:j]
nlevel=42 [index:k]
ntimes=240 [index:n]
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
latitudes(ny) = -90, -89.6, -89.2, -88.8, ...
88.4, 88.8, 89.2, 89.6, 90 [degrees_north]
p(nlevel) = 100369, 95898,…
47, 36, 25, 14, 3, 1 [Pa]
times(ntimes) = 2018-03-18 12, 2018-03-19 12, …
Sabine Schröder Kinds of data
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
12 November 2018
Vertical coordinate systems:
𝜎 = 𝑝 𝑝
𝑠formula: sigma(n,k,j,i) = p(k)/ps(n,j,i) nx=900 [index:i], ny=451 [index:j]
nlevel=42 [index:k]
ntimes=240 [index:n]
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
latitudes(ny) = -90, -89.6, -89.2, -88.8, ...
88.4, 88.8, 89.2, 89.6, 90 [degrees_north]
p(nlevel) = 100369, 95898,…
47, 36, 25, 14, 3, 1 [Pa]
times(ntimes) = 2018-03-18 12, 2018-03-19 12, … 2018-11-11 12, 2018-11-12 12
[datetime]
ps(ntimes,ny,nx) = … [Pa]
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
Vertical coordinate systems:
formula: sigma(n,k,j,i) = p(k)/ps(n,j,i) nx=900 [index:i], ny=451 [index:j]
nlevel=42 [index:k]
ntimes=240 [index:n]
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
latitudes(ny) = -90, -89.6, -89.2, -88.8, ...
88.4, 88.8, 89.2, 89.6, 90 [degrees_north]
p(nlevel) = 100369, 95898,…
47, 36, 25, 14, 3, 1 [Pa]
times(ntimes) = 2018-03-18 12, 2018-03-19 12, …
Sabine Schröder Kinds of data
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
12 November 2018
Vertical coordinate systems:
𝜎 = 𝑝 𝑝
𝑠formula: sigma(n,k,j,i) = p(k)/ps(n,j,i) nx=900 [index:i], ny=451 [index:j]
nlevel=42 [index:k]
ntimes=240 [index:n]
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
latitudes(ny) = -90, -89.6, -89.2, -88.8, ...
88.4, 88.8, 89.2, 89.6, 90 [degrees_north]
p(nlevel) = 100369, 95898,…
47, 36, 25, 14, 3, 1 [Pa]
times(ntimes) = 2018-03-18 12, 2018-03-19 12, … 2018-11-11 12, 2018-11-12 12
[datetime]
ps(ntimes,ny,nx) = … [Pa]
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
Vertical coordinate systems:
Sabine Schröder Kinds of data
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
12 November 2018
Vertical coordinate systems:
𝜎 = 𝑝
𝑝
𝑠𝑝
𝑖= 𝐴
𝑖× 𝑝
0+ 𝐵
𝑖× 𝑝
𝑠formula: p(n,k,j,i) = a(k)*p0 + b(k)*ps(n,j,i) nx=900 [index:i], ny=451 [index:j]
nlevel=42 [index:k]
ntimes=240 [index:n]
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
latitudes(ny) = -90, -89.6, -89.2, -88.8, ...
88.4, 88.8, 89.2, 89.6, 90 [degrees_north]
times(ntimes) = 2018-03-18 12, 2018-03-19 12, … 2018-11-11 12, 2018-11-12 12
[datetime]
ps(ntimes,ny,nx) = … [Pa]
p0 = 100000 [Pa]
a(nlevel) = 0, 3.68387e-05, 0.00037, 0.00138, ...
[-]
b(nlevel) = 0.99882, 0.99582, 0.99114, ...
[-]
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
Vertical coordinate systems:
formula: p(n,k,j,i) = a(k)*p0 + b(k)*ps(n,j,i) nx=900 [index:i], ny=451 [index:j]
nlevel=42 [index:k]
ntimes=240 [index:n]
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
latitudes(ny) = -90, -89.6, -89.2, -88.8, ...
88.4, 88.8, 89.2, 89.6, 90 [degrees_north]
times(ntimes) = 2018-03-18 12, 2018-03-19 12, … 2018-11-11 12, 2018-11-12 12
[datetime]
ps(ntimes,ny,nx) = … [Pa]
p0 = 100000 [Pa]
Sabine Schröder Kinds of data
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
12 November 2018
Vertical coordinate systems:
𝜎 = 𝑝
𝑝
𝑠𝑝
𝑖= 𝐴
𝑖× 𝑝
0+ 𝐵
𝑖× 𝑝
𝑠formula: p(n,k,j,i) = a(k)*p0 + b(k)*ps(n,j,i) nx=900 [index:i], ny=451 [index:j]
nlevel=42 [index:k]
ntimes=240 [index:n]
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
latitudes(ny) = -90, -89.6, -89.2, -88.8, ...
88.4, 88.8, 89.2, 89.6, 90 [degrees_north]
times(ntimes) = 2018-03-18 12, 2018-03-19 12, … 2018-11-11 12, 2018-11-12 12
[datetime]
ps(ntimes,ny,nx) = … [Pa]
p0 = 100000 [Pa]
a(nlevel) = 0, 3.68387e-05, 0.00037, 0.00138, ...
[-]
b(nlevel) = 0.99882, 0.99582, 0.99114, ...
[-]
EXAMPLE: GEO-SCIENTIFIC DATA AND COORDINATE SYSTEMS
Vertical coordinate systems:
nlevel=42
level(nlevel) = 1, 3, 14, 25, 36, 47, … 84952, 95898, 100369
[Pa]
formula: sigma(n,k,j,i) = p(k)/ps(n,j,i) nx=900 [index:i], ny=451 [index:j]
nlevel=42 [index:k]
ntimes=240 [index:n]
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
latitudes(ny) = -90, -89.6, -89.2, -88.8, ...
88.4, 88.8, 89.2, 89.6, 90 [degrees_north]
p(nlevel) = 100369, 95898,…
47, 36, 25, 14, 3, 1 [Pa]
times(ntimes) = 2018-03-18 12, 2018-03-19 12, …
formula: p(n,k,j,i) = a(k)*p0 + b(k)*ps(n,j,i) nx=900 [index:i], ny=451 [index:j]
nlevel=42 [index:k]
ntimes=240 [index:n]
longitudes(nx) = 0, 0.4, 0.8, 1.2, ...
358, 358.4, 358.8, 359.2, 359.6 [degrees_east]
latitudes(ny) = -90, -89.6, -89.2, -88.8, ...
88.4, 88.8, 89.2, 89.6, 90 [degrees_north]
times(ntimes) = 2018-03-18 12, 2018-03-19 12, … 2018-11-11 12, 2018-11-12 12
[datetime]
ps(ntimes,ny,nx) = … [Pa]
p0 = 100000 [Pa]
Sabine Schröder Kinds of data
SUMMARY/OUTLOOK
12 November 2018
Data type defined by - type of value
- type of mathematical, relational or logical operations Choosing a data type dependent on
- representation of relevant real-world aspects
- intention of using the digital data (access time and efficiency) - available data space
Metadata
- information about other data:
* descriptive
* administrative
* structural Digital data
- human readable - binary