• Keine Ergebnisse gefunden

Best Practice and Definitions of Data Sciences : Beyond Statistics

N/A
N/A
Protected

Academic year: 2022

Aktie "Best Practice and Definitions of Data Sciences : Beyond Statistics"

Copied!
13
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Delegates’ Summit: Best Practice and Definitions of Data Sciences

Delegates Summit:

Best Practice and Definitions of Data Sciences – Beyond Statistics

September 25, 2017

The Seventh Symposium on

Advanced Computation and Information in Natural and Applied Sciences The International Conference on Numerical Analysis and Applied Mathematics (ICNAAM 2017)

September 25 – 30, 2017, Thessaloniki, Greece

Dr. rer. nat. Claus-Peter R¨ uckemann

1,2,3

1

Westf¨ alische Wilhelms-Universit¨ at M¨ unster (WWU), M¨ unster, Germany

2

Leibniz Universit¨ at Hannover, Hannover, Germany

3

North-German Supercomputing Alliance (HLRN), Germany ruckema(at)uni-muenster.de

Compute Services Storage Services and Resources Resources Applications Knowledge Resources

Scientific Resources Databases Containers Documentation

Originary Resources

Resources Workspace Compute and StorageResources

Resources Storage Components

and Sourcesand

(c) Rückemann 2012 Services Interfaces Services Interfaces

Services Interfaces Services Interfaces Services Interfaces Accounting

Grid, Cloud middleware Security

computingTrusted

&

Grid, Cloud, Sky services

HPC

Geo− Geoscientific

MPI Interactive Legal

Point/Line

Parallel.

NG−Arch.

Design Interface Vector data 2D/2.5D

Raster data Algorithms Framework

Metadata 3D/4D MMedia/POI

Batch Data Service Computing

Services Distrib.

Broadband Market

Service Provider Sciences Energy−

Sciences Environm.

Customers Market

resources computing res.Distributed Distributeddata storage

Workflows Data management

Generalisation Integration/fusion

Multiscale geo−data GIS

components Data Collection/Automation Data ProcessingData Transfer

companies, universities ...

Provider, Scientific institutions, Geo−scientific processing Simulation

GIS Resource requirements Visualisation Virtualisation

Navigation Integration

Geo−data Services

High Performance Computing, Grid, and Cloud resources Geo services: Web Services / Grid−GIS services

VisualisationService chainsQuality management

Distributed/mobile Geoinformatics, Geophysics, Geology, Geography, ...

Exploration Ecology

Networks InfiniBand

Tracking Geo monitoring Geo−Information, Customers, Service, Archaeology

Disciplines Services Resources

Processing Computing

Instructions Data Validation

addressing Resources Output Validation Element

Compute job Output

Execution Element

Configuration

Compute taskCEN

Element integration

Storage task OEN

Element integration c

Application communicationIPC

b a

(2)

Delegates’ Summit: Best Practice and Definitions of Data Sciences Delegates Summit: Best Practice & Definitions of Data Sciences. . .

Delegates Summit: Best Practice & Definitions of Data Sciences . . . Delegates and Contributors

Claus-Peter R¨ uckemann (Moderator),

Westf¨ alische Wilhelms-Universit¨ at M¨ unster (WWU) / Knowledge in Motion, DIMF / Leibniz Universit¨ at Hannover / North-German Supercomputing Alliance (HLRN), Germany Oleg O. Iakushkin,

Department of Computer Modelling and Multiprocessor Systems at the Faculty of Applied Mathematics and Control Processes, Saint-Petersburg State University, Russia

Lutz Schubert,

IOMI, University of Ulm, Germany Friedrich H¨ ulsmann,

Knowledge in Motion, DIMF, Germany Birgit Gersbeck-Schierholz,

Knowledge in Motion, DIMF, Germany Olaf Lau,

Knowledge in Motion, DIMF, Germany

The International Conference on Numerical Analysis and Applied Mathematics (ICNAAM 2017), The Seventh Symp. on Advanced Computation and Information in Natural and Applied Sciences, CfP: https://research.cs.wisc.edu/dbworld/messages/2017-05/1493741666.html Program:

http://icnaam.org/sites/default/files/Preliminary%20Program%20of%20ICNAAM%202017_ver_3.pdf

c

2017 Dr. rer. nat. Claus-Peter R¨uckemann Delegates’ Summit: Best Practice and Definitions of Data Sciences

(3)

Delegates’ Summit: Best Practice and Definitions of Data Sciences

Recall: Last Years’ Post-Summit Results In 80 Words Around The World.

Recall: Last Years’ Post-Summit Results

In 80 Words Around The World.

Knowledge and Computing Definitions

(Delegates and other contributors)

“Knowledge is created from a subjective combination of different attainments as there are intuition, experience, information, education, decision, power of persuasion and so on, which are selected, compared and balanced against each other, which are transformed, interpreted, and used in reasoning, also to infer further knowledge. Therefore, not all the knowledge can be explicitly formalised. Knowledge and content are multi- and inter-disciplinary long-term targets and values. In practice, powerful and secure information technology can support knowledge-based works and values.”

“Computing means methodologies, technological means, and devices applicable for universal automatic manipulation and processing of data and information.

Computing is a practical tool and has well defined purposes and goals.”

Citation:R¨uckemann, C.-P., Skurowski, P., Staniszewski, M., H¨ulsmann, F., and Gersbeck-Schierholz, B. (2015): Post-Summit Results, Delegates’ Summit: Best Practice and Definitions of Knowledge and Computing; Sep. 23, 2015, The Fifth Symposium on Advanced Computation and Information in Natural and Applied Sciences (SACINAS), The 13th Internat. Conf. of Numerical Analysis and Applied Mathematics (ICNAAM), Sep. 23–29, 2015, Rhodes, Greece. URL:http:

// www. user. uni- hannover. de/ cpr/ x/ publ/ 2015/ delegatessummit2015/ rueckemann_ icnaam2015_ summit_ summary. pdf Delegates and contributors:Claus-Peter R¨uckemann, Friedrich H¨ulsmann, Birgit Gersbeck-Schierholz, Knowledge in Motion / Unabh¨angiges Deutsches Institut f¨ur Multi-disziplin¨are Forschung (DIMF), Germany;Przemys law Skurowski, Micha l Staniszewski, Silesian University of Technology, Gliwice, Poland;International EULISP post-graduate participants, ISSC, European Legal Informatics Study Programme, Leibniz Universit¨at Hannover, Germany

(4)

Delegates’ Summit: Best Practice and Definitions of Data Sciences

Recall: Last Years’ Post-Summit Results In 80 Words Around The World.

Recall: Last Years’ Post-Summit Results

In 80 Words Around The World.

Data-centric and Big Data Definitions

(Delegates and other contributors)

“ The term data-centric refers to a focus, in which data is most relevant in context with a purpose. Data structuring, data shaping, and long-term aspects are important concerns.

Data-centricity concentrates on data-based content and is benefitial for information and knowledge and for emphasizing their value. Technical implementations need to consider distributed data, non-distributed data, and data locality and enable advanced data handling and analysis. Implementations should support separating data from technical implementations as far as possible.”

“ The term Big Data refers to data of size and/or complexity at the upper limit of what is currently feasible to be handled with storage and computing installations. Big Data can be structured and unstructured. Data use with associated application scenarios can be categorised by volume, velocity, variability, vitality, veracity, value, etc. Driving forces in context with Big Data are advanced data analysis and insight. Disciplines have to define their ‘currency’ when advancing from Big Data to Value Data.”

Citation:R¨uckemann, C.-P., Kovacheva, Z., Schubert, L., Lishchuk, I., Gersbeck-Schierholz, B., and H¨ulsmann, F. (2016): Post-Summit Results, Delegates’ Summit: Best Practice and Definitions of Data-centric and Big Data – Science, Society, Law, Industry, and Engineering; Sep. 19, 2016, The Sixth Symposium on Advanced Computation and Information in Natural and Applied Sciences (SACINAS), The 14th Internat. Conf. of Numerical Analysis and Applied Mathematics (ICNAAM), Sep. 19–25, 2016, Rhodes, Greece.

URL:http:

// www. user. uni- hannover. de/ cpr/ x/ publ/ 2016/ delegatessummit2016/ rueckemann_ icnaam2016_ summit_ summary. pdf Delegates and contributors:Claus-Peter R¨uckemann, Knowledge in Motion / Unabh¨angiges Deutsches Institut f¨ur Multi-disziplin¨are Forschung (DIMF), Germany;Zlatinka Kovacheva, Middle East College, Department of Mathematics and Applied Sciences, Muscat, Oman;Lutz Schubert, University of Ulm, Germany;Iryna Lishchuk, Leibniz Universit¨at Hannover, Institut f¨ur Rechtsinformatik, Germany; Birgit Gersbeck-Schierholz, Friedrich H¨ulsmann, Knowledge in Motion / Unabh¨angiges Deutsches Institut f¨ur Multi-disziplin¨are Forschung (DIMF), Germany

c

2017 Dr. rer. nat. Claus-Peter R¨uckemann Delegates’ Summit: Best Practice and Definitions of Data Sciences

(5)

Delegates’ Summit: Best Practice and Definitions of Data Sciences

Best Practice and Definitions In 80 Words Around The World.

Best Practice and Definitions

In 80 Words Around The World.

Statements on Data-Sciences (1/3)

(Delegates and other contributors)

“Data Science is directly related to the processing and cross-correlation of individual sources of information.

An example of such data are images and video sequences received by users of social networks. These data allow data scientists to analyze the selected paths, range of interests and the emotional response of users at the locations of interest.

Such research is directly related to the tasks of clustering and high-performance data processing which we are investigated at Saint Petersburg State University.”

Oleg O. Iakushkin, Department of Computer Modelling and Multiprocessor Systems at the Faculty of Applied Mathematics and Control Processes, Saint-Petersburg State Universityi, Russia

Contact: o.yakushkin@spbu.ru,

Universitetskii prospekt 35, Petergof, Saint Petersburg, Russia 198504

(6)

Delegates’ Summit: Best Practice and Definitions of Data Sciences

Best Practice and Definitions In 80 Words Around The World.

Best Practice and Definitions

In 80 Words Around The World.

Statements on Data-Sciences (2/3)

(Delegates and other contributors)

“Data Science: Do we still understand data? Big data and artificial intelligence all boil down to statistical analysis with no understanding of the meaning of data - how reliable is the information we gain from this and what is needed to make data and derived information explicable?”

Lutz Schubert, IOMI, University of Ulm, Germany.

Contact: lutz.schubert@uni-ulm.de

c

2017 Dr. rer. nat. Claus-Peter R¨uckemann Delegates’ Summit: Best Practice and Definitions of Data Sciences

(7)

Delegates’ Summit: Best Practice and Definitions of Data Sciences

Best Practice and Definitions In 80 Words Around The World.

Best Practice and Definitions

In 80 Words Around The World.

Statements on Data-Sciences (3/3)

(Delegates and other contributors)

“Qualified Data, especially for an enterprise, represents frozen knowledge or in other words frozen value. The ability to manage these data is what we call data science.

Hence, data science is by definition secondary to data. The essence of Data Science is to give qualified access to relevant data to the owners and users.

Hardware and software and their implementation represent the tertiary level of qualified and high level data.”

Data results from action!

Examples are from insurance companies to research data focussed disciplines.

Claus-Peter R¨ uckemann, Friedrich H¨ ulsmann, Birgit Gersbeck-Schierholz, Olaf Lau, Knowledge in Motion / Unabh¨ angiges Deutsches Institut f¨ ur Multi-disziplin¨ are Forschung (DIMF), Germany.

Contact: ruckema@uni-muenster.de

(8)

Delegates’ Summit: Best Practice and Definitions of Data Sciences

Best Practice and Definitions In 80 Words Around The World.

Best Practice and Definitions

In 80 Words Around The World.

Case: Natural sciences & research

Source: H¨ulsmann, R¨uckemann, Gersbeck-Schierholz (KiM, DIMF)

Focussed on research insight On-purpose components

Local workplaces (e.g., institute)

Research data management, e.g., in most cases only archive Data exchange possible, long-term interest maybe limited Rarely common standards with respect to data structures and documentation

Data used for new insight

Single data-sets can grow up to >1 TB Long-term storage / archiving

Data Science: Knowledge focus, file type, data centricity, small percentage of statistics, specialised algorithms, different requirements regarding long-term availability research purpose, insight / research

c

2017 Dr. rer. nat. Claus-Peter R¨uckemann Delegates’ Summit: Best Practice and Definitions of Data Sciences

(9)

Delegates’ Summit: Best Practice and Definitions of Data Sciences

Best Practice and Definitions In 80 Words Around The World.

Best Practice and Definitions

In 80 Words Around The World.

Case: Insurance and business

Source: Lau (KiM, DIMF)

Focussed on preset purpose

Ancillary conditions implemented in the system Distributed workplaces (e.g., different sites) Databases (e.g., using SAP)

Limited data exchange

Weather data integration with case files (e.g., images from national weather service)

Data used for decision making Case files per case up to >100 MB

Mid-term storage / archiving (esp., according with European regulations, code of conduct)

Data Science: Documentation, database type, statistics, long-term

availability, business purpose, legal and financial aspects

(10)

Delegates’ Summit: Best Practice and Definitions of Data Sciences

Best Practice and Definitions In 80 Words Around The World.

Best Practice and Definitions

In 80 Words Around The World.

Statements on Data Science / Data Sciences

(Delegates and other contributors)

How should Data Science / Data Sciences be defined?

Which Best Practice for Data Science / Data Sciences can be summarised?

c

2017 Dr. rer. nat. Claus-Peter R¨uckemann Delegates’ Summit: Best Practice and Definitions of Data Sciences

(11)

Delegates’ Summit: Best Practice and Definitions of Data Sciences Conclusions, Discussion, Networking

Conclusions, Discussion, Networking

Data Sciences

(Delegates and other contributors)

Data results from action.

It is not reasonable to apply statistics without sufficient understanding of the data and originating context.

Deriving explicable information requires qualified data and understanding.

Data Science is directly related to processing of data, cross-correlation of individual sources of information, and data analysis.

Data, especially Qualified Data, can represent knowledge, respective value.

Implementation of methods and algorithms can be seen consecutive to the creation of data.

Implementation of hardware and software resources and services can be seen consecutive to methods and algorithms.

Methods and algorithms experienced certain development over the last decades, nevertheless, a common basic understanding is still not widely present.

A ternary view of “Data-Algorithm-Implementation” can suitable to characterise an

individual scenario, e.g., via ternary diagrams. More complex views can be created when

required with advancing constellations.

(12)

Delegates’ Summit: Best Practice and Definitions of Data Sciences Networking and Outlook

Networking and Outlook

Thank you for your attention!

Wish you an inspiring conference and a pleasant stay in Thessaloniki!

Looking forward to seeing you again next year for the Symposium on Advanced Computation and Information!

c

2017 Dr. rer. nat. Claus-Peter R¨uckemann Delegates’ Summit: Best Practice and Definitions of Data Sciences

(13)

Delegates’ Summit: Best Practice and Definitions of Data Sciences

Post-Summit Results In 80 Words Around The World.

Post-Summit Results

In 80 Words Around The World.

Data Science Definition

(Delegates and other contributors)

“Qualified Data, especially for an enterprise, represents frozen knowledge or in other words frozen value.

The abilities to understand and manage these data is what we call data science.

Data results from action, hence, data science can be defined secondary to data. The essence of Data Science is to give qualified access to relevant data to owners and users.

Hardware and software and their implementation represent the tertiary level of qualified and high level data.”

Citation:R¨uckemann, C.-P., Iakushkin, O. O., Gersbeck-Schierholz, B., H¨ulsmann, F., Schubert, L., and Lau, O. (2017): Post-Summit Results, Delegates’ Summit: Best Practice and Definitions of Data Sciences – Beyond Statistics; Sep. 25, 2017, The Seventh Symposium on Advanced Computation and Information in Natural and Applied Sciences (SACINAS), The 15th Internat. Conf. of Numerical Analysis and Applied Mathematics (ICNAAM), Sep. 25–30, 2017, Thessaloniki, Greece. URL:http:

// www. user. uni- hannover. de/ cpr/ x/ publ/ 2017/ delegatessummit2017/ rueckemann_ icnaam2017_ summit_ summary. pdf Delegates and contributors:Claus-Peter R¨uckemann, Knowledge in Motion / Unabh¨angiges Deutsches Institut f¨ur Multi-disziplin¨are Forschung (DIMF), Germany;Oleg O. Iakushkin, Department of Computer Modelling and Multiprocessor Systems at the Faculty of Applied Mathematics and Control Processes, Saint-Petersburg State University, Russia;Birgit Gersbeck-Schierholz, Knowledge in Motion / Unabh¨angiges Deutsches Institut f¨ur Multi-disziplin¨are Forschung (DIMF), Germany;Friedrich H¨ulsmann, Knowledge in Motion / Unabh¨angiges Deutsches Institut f¨ur Multi-disziplin¨are Forschung (DIMF), Germany;Lutz Schubert, IOMI, University of Ulm, Germany;

Olaf Lau, Knowledge in Motion / Unabh¨angiges Deutsches Institut f¨ur Multi-disziplin¨are Forschung (DIMF), Germany.

Referenzen

ÄHNLICHE DOKUMENTE

-  “Open access contributions include original scientific research results, raw data and metadata, source.. materials, digital representations of

H¨ ohere Mathematik I f¨ ur die Fachrichtung

3- 5 Density Estimation 4- 7 Insurance Risk 4- 5 Bankruptcy 3- 8 Adaptive Smoothing 2- 7 Glyphs 3- 9 Smoothing 3-18 Regression by Parts 4- 3 Medical Image 2-10 Trees & Forests 2-

Assume you have an observation of 1 event, were you expect 0 due to already known processes. You want to quote a 95% CL upper limit on the true value of the expected events for

From the findings, it is seen that the organization rarely encourages experienced workers to share knowledge with the new employees (item 6 – scale 4), and also it can be seen

To edit mining pool data, the user may choose mining pool and / or altcoin available for moni- toring from corresponding dropdown lists, edit altcoin wallet or altcoin wallet

Delegates and contributors: Claus-Peter R¨ uckemann, Knowledge in Motion / Unabh¨ angiges Deutsches Institut f¨ ur Multi-disziplin¨ are Forschung (DIMF), Germany;Zlatinka

pdf Delegates and contributors: Claus-Peter R¨ uckemann, Knowledge in Motion / Unabh¨ angiges Deutsches Institut f¨ ur Multi-disziplin¨ are Forschung (DIMF), Germany;Zlatinka