Prof. Dr. Martin Werner www.martinwerner.de
Big Geospatial Data
www.martinwerner.de
Me...
●
Studied math (algebraic topology) in Bonn●
Doctoral Dissertation in computerscience (indoor navigation) in Munich
www.martinwerner.de
… and you?
www.martinwerner.de
Overview ...
▪
Part I: Parallel Computing-
Parallel Programming-
Message Passing Interface-
Annotation-based Multiprocessing Using OpenMP-
GPU Computing Using NVIDIA CUDA-
MapReduce-
Apache Big Data Stack (Hadoop, Spark, …)▪
Part II: Selected Algorithms for Spatial Big Data-
Points, Images, Street Networks (aka Spatial Data Types)www.martinwerner.de
Overview ...
▪
Part III: Examples and Appendix-
Trajectory Clustering Using Traclus and OpenMP-
Word Counting in Various MapReduce environments-
Trajectory Similarity Matrix Computations (aka ACM SIGSPATIAL GIS Cup 2017)-
Counting astronomic objects from Hubble Space ImagesYour Ideas? Just
propose application
areas or concrete
www.martinwerner.de
Lecture Documentation
I am doing the following to allow you to follow this lecture easily:
-
Write a script (but, please, take your own notes during lecture-
Publish slides (of course)-
Provide sources via githubYou can do the following to help with this lecture:
-
translate sources to your favourite programming languages-
write corrections (from typos to errors) for the script-
add your own case-studies to the script in the Appendixwww.martinwerner.de
Github - Why?
Using Github, we can
▪
easily share code▪
easily discuss proposed changes▪
keep track of contributioins and activities Basic Usage:▪
Go to the repository and download all sources Advanced Usage:▪
Edit something and create a pull request to notify me, where something is wrong or not working in your environment.www.martinwerner.de
However
I am still living in Munich with my family.
Therefore, I want to ask you:
▪
Can we (partly) block this lecture into larger units?This semester, the lecture is one hour of lecture and one hour of practice. This means, that we should have 14 hours of lecture.
Teaching must be completed until July 15th.
I would be very happy, if we could have 2 hours on Tuesday in the
www.martinwerner.de
Time Table
12.4. (today) Lecture (2h) Introduction
19.4. free
26.4. free
3.5. Lecture (4h) incl 2h additional lecture
10.5. Exercise
17.5. free
24.5. Lecture( 4h) incl. 2h additional lecture
31.5. free
7.6. Tutorial
14.6. free
21.6. Tutorial
28.6. Lecture (2h)
www.martinwerner.de
Programming Languages for Big Data
What is your favourite programming language?
www.martinwerner.de
A choice of languages for (Spatial) Big Data
-
Python-
R-
MATLAB and Octave-
C++-
Java-
Scalaand some more specific languages depending on the actual context.
www.martinwerner.de
Python
Advantages
-
Nice, modern scripting language-
Huge amount of software available-
C++ friendly (easy to extend towards high performance) Drawbacks-
Difficult to read (this one bracket expression that is good, because it once seemed to work, does?)-
Software Quality (especially packages) varies-
Easy to break: Virtual environment stuff, versions, python2 vs.python3
www.martinwerner.de
R
Advantages
-
Classical language, good documentation-
Uniform names for common actions (fit, model, predict, plot,...)-
Extremely C++ friendly (easy to extend towards high performance)-
Very good plot defaults for scientific computing-
Nice IDE (RStudio)-
CRAN - Peer-Reviewed source code packages for almost everything in statistical computingDrawbacks
-
Not the easiest to start with-
Sometimes difficult to read due to complex statementswww.martinwerner.de
MATLAB / Octave
Advantages
-
Matrix-centered multi-purpose programming-
Very good documentation, wide usage in the field-
Extensible-
High-quality toolboxes (however, expensive!) for MATLAB Drawbacks-
Expensive-
Non Open Source-
Open-Source version Octave is not fully equivalentwww.martinwerner.de
C++
Advantages
-
High performance-
Extremely high-quality libraries (boost)-
Platform-independece even towards GPU and Embedded-
Embeddable into Python, Java, R and MATLAB (almost anywhere)-
full support for generic programming-
very modern standard (C++17 is ready) Drawbacks-
Compiler errors are difficult to read (especially, when using generics)-
Some inconsistencies between compilerswww.martinwerner.de
Java
Advantages
-
Good performance-
High-quality Design and Runtime-
Platform-independece-
Easy to learn (very good error messages)-
Safe memory management Drawwbacks-
Unable to unlock some aspects of modern computers (GPUs, specific instructions)-
Overhead produced by memory managementwww.martinwerner.de
Scala
Advantages
-
A modern approach to functional programming-
Compatible with Java, running on top of JVM-
Platform-independece Drawwbacks-
Unable to unlock some aspects of modern computers (GPUs, specific instructions)-
Overhead produced by memory managementwww.martinwerner.de
Wrap-Up
-
Python: A useful scripting language with high adoption ratae, but sometimes easy to break-
R: A fully function data science environment that feels like a classical imperative scripting language-
MATLAB and Octave: You need matrices and matrix algebra, then consider MATLAB and Octave.-
C++: You need to scale up to unlimited performance still using a high-quality, nice language: C++ is here for you.-
Java: You need to scale out? Java is the way to go. Not the fastest, not the most efficient, but easy to use and not soProf. Dr. Martin Werner www.martinwerner.de
Motivating Example:
OpenCV and Python for Face
Recognition
www.martinwerner.de
Python: Simple Face Detection
Prepare your system…
-
sudo apt-get install python-opencv # for Debian / Ubuntu-
git clone https://github.com/shantnu/FaceDetect/Run the example...
bgd:~$ python face_detect.py abba.png haarcascade_frontalface_default.xml
www.martinwerner.de
import cv2 import sys
# Get user supplied values imagePath = sys.argv[1]
cascPath = sys.argv[2]
# Create the haar cascade
faceCascade = cv2.CascadeClassifier(cascPath)
# Read the image
image = cv2.imread(imagePath) face_detect.py
www.martinwerner.de
face_detect.py
# Detect faces in the image
faces = faceCascade.detectMultiScale(
gray,
scaleFactor=1.1, minNeighbors=5, minSize=(30, 30),
flags = cv2.cv.CV_HAAR_SCALE_IMAGE )
print("Found {0} faces!".format(len(faces)))
www.martinwerner.de
face_detect.py
# Draw a rectangle around the faces for (x, y, w, h) in faces:
cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2)
cv2.imshow("Faces found", image) cv2.waitKey(0)
www.martinwerner.de
Face Detection in Python
The source code is
-
easy to read-
easy to modify-
Complex algorithms made accessible for anyone-
Performance overhead can be ignored.Prof. Dr. Martin Werner www.martinwerner.de
Motivating Example:
Trajectory Clustering Using R
www.martinwerner.de
Movement and Tracking in Video Surveillance
www.martinwerner.de
Sports… (David Alaba)
www.martinwerner.de
Mapping and GIS
www.martinwerner.de
Personal Tracking (runtastic)
www.martinwerner.de
Biology (Orca Movement)
www.martinwerner.de
Trajectory Clustering in R
http://martinwerner.de/blog/traclus Prepare your system
-
Install a recent version of R (from CRAN)-
Install and compile libtrajcomphttps://github.com/mwernerds/trajcomp Run the example...
www.martinwerner.de
R example: TRACLUS
www.martinwerner.de
R example: TRACLUS
www.martinwerner.de
R example: TRACLUS
This is a very typical R calling sequence
using the keyword function() to define an
inline function and setting a parameter
www.martinwerner.de
Wrapup
Trajectory Clustering in R
-
easy to use-
functional sorting and grouping proved useful (ddply)Prof. Dr. Martin Werner www.martinwerner.de
Motivating Example:
Working with Astronomical
Images from C++
www.martinwerner.de
Prepare your system…
Find it in Stud-IP, not in our github
-
Install libTIFF for reading very huge images (many other tools will fail on this 980 MB file)-
Install libpng for exporting imagery-
Download the imagehttp://www.spacetelescope.org/images/heic1620a/
Run the example...
bgd:~$ make
g++ -std=c++11 -fopenmp -ltiff -lpng -o stars_omp stars_omp.cpp bgd:~$ ./stars_omp ~/Downloads/heic1620a.tif
input file /home/martin/Downloads/heic1620a.tif
www.martinwerner.de
A part
www.martinwerner.de
.. and another part
Prof. Dr. Martin Werner www.martinwerner.de