• Keine Ergebnisse gefunden

Data Analysis in the Life Sciences : Sparking Ideas

N/A
N/A
Protected

Academic year: 2022

Aktie "Data Analysis in the Life Sciences : Sparking Ideas"

Copied!
1
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Data Analysis in the Life Sciences

— Sparking Ideas —

Michael R. Berthold

ALTANA-Chair for Bioinformatics and Information Mining, Dept. of Computer and Information Science, Konstanz University, Germany

Michael.Berthold@uni-konstanz.de

Data from various areas of Life Sciences have increasingly caught the attention of data mining and machine learning researchers. Not only is the amount of data available mind-boggling but the diverse and heterogenous nature of the infor- mation is far beyond any other data analysis problem so far. In sharp contrast to classical data analysis scenarios, the life science area poses challenges of a rather different nature for mainly two reasons. Firstly, the available data stems from heterogenous information sources of varying degrees of reliability and qual- ity and is, without the interactive, constant interpretation of a domain expert, not useful. Furthermore, predictive models are of only marginal interest to those users – instead they hope for new insights into a complex, biological system that is only partially represented within that data anyway. In this scenario, the data serves mainly to create new insights and generate new ideas that can be tested.

Secondly, the notion of feature space and the accompanying measures of similar- ity cannot be taken for granted. Similarity measures become context dependent and it is often the case that within one analysis task several different ways of describing the objects of interest or measuring similarity between them matter.

Some more recently published work in the data analysis area has started to address some of these issues. For example, data analysis in parallel universes [1], that is, the detection of patterns of interest in various different descriptor spaces at the same time, and mining of frequent, discriminative fragments in large, molecular data bases [2]. In both cases, sheer numerical performance is not the focus; it is rather the discovery of interpretable pieces of evidence that lights up new ideas in the users mind. Future work in data analysis in the life sciences needs to keep this in mind: the goal is to trigger new ideas and stimulate interesting associations.

References

1. Berthold, M.R., Wiswedel, B., Patterson, D.E.: Interactive exploration of fuzzy clus- ters using neighborgrams. Fuzzy Sets and Systems 149 (2005) 21-37

2. Hofer, H., Borgelt, C., Berthold, M.R.: Large scale mining of molecular fragments with wildcards. Intelligent Data Analysis 8 (2004) 376-385

First publ. in: Machine Learning. 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005. Proceedings. Berlin: Springer, 2005, p.1

Konstanzer Online-Publikations-System (KOPS) URL: http://www.ub.uni-konstanz.de/kops/volltexte/2008/6777/

URN: http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-67770

Referenzen

ÄHNLICHE DOKUMENTE

Between the theoretical let us stress the following: we show how the methods of the analysis of the time series widely used in statistical treatment of monitoring data could

Yashin (1985) Heterogeneity's Ruses: Some Surprising Effects of Selection on Population

In the forth analytical step (Figure 2 in the main document, bottom right), from the 21 single or ag - gregated variables (Table 1 in the main document; 6 continuous, 15

Mögen dies auch noch die letzten Ausläufer der 68-er-Jahre gewesen sein, so ist es doch beeindruckend, welche Vielfalt und Ebenbürtigkeit, wenn nicht gar Überlegenheit sich im

The log initial bigram frequency, included as a covariate for word naming, turned out to be a significant predictor in both tasks, a simple facilitatory linear predictor in

One can see the title of the event ”Event #NEWS on August 2015 ”, the used hashtag ”#news”, date and time of the first and last collected tweet and the total amount of tweets ”

Author contributions BB has led overall research activities from proposal development to data compilation, data entry and processing, data analysis, and interpretation of the result

1: Maps on districts locations and status of child health at districts level in the province of Punjab Data source: Pakistan Demographic Survey (PDS) 2005.. Multi Indicator