• Keine Ergebnisse gefunden

3.5 Machine Learning Approaches

4.1.1 MIAME

4.1. Standardization and Specification 51 interpretation, as can be seen, for example, from the recommendations for web-technology. Individual interpretation can become an issue, in particular when the content of experimental annotation is concerned. As experimental settings and conditions cannot be known beforehand, new techniques or experiments not covered by the specification can arise. The following suggestions about the content of experimental annotations can be seen only as free-text guidelines for the required level of detail.

Nevertheless, a formal syntax or document structure to convey microarray related content is required for an appropriate experimental annotation. It has to be flexible, to be easily adapted to evolving research techniques and settings. On the other hand, the formalism needs to be precisely defined to be accessible for automated processing algorithms.

providers, research groups employing microarrays on a large scale, and commercial providers of microarrays, reagents, and related hardware and software. The primary interest of MGED is to define standards for communicating microarray experiments.

The first major result of that effort was the publication of the MIAME (Minimum Information About a Microarray Experiment) recommendations (Brazma et al., 2001). The intention behind MIAME is to provide a guideline on information to be recorded for interpreting of experimental results and performing independent veri-fication. The MGED group also laid down a MIAME-checklist which can be used by authors and publishers of microarray related publications to check whether in-formation is complete5. An open letter, sent to major journals by MGED members, suggested to require submission of MIAME-compliant data to a public repository prior to publication of microarray related articles. These recommendations were consecutively adopted by many journals, among which are the well-renowned med-ical journal The Lancet, as well as Bioinformatics and Nucleic Acids Research.

MIAME compliant annotation of microarray experiments can be divided into the following categories:

Array design information Each individual array used for the experiment has to be annotated, as well as its design. An array needs as a minimum a unique ID to reference it in the experiment table and the name of the array design used, as there can be many arrays with different designs within a single experiment.

The array design describes the physical layout of a set of microarrays. For each design a name as well as contact information of the vendor are needed.

Technological information includes the microarray platform, the type and origin of the reporter molecules, the surface type, the number of features and physical dimensions, production protocol and date. For every individual element on the microarray, physical location, sequence, and sequence type, as well as many other properties of the corresponding biomolecules have to be reported.

Experimental design The experiment can be seen as the main grouping unit for hybridizations. It groups all related hybridizations for the analysis of a scien-tific question. According to the MIAME-checklist, an experimental annota-tion should include contact informaannota-tion for the experimenter or lab, a short free-text description of the experiment and findings, bibliographic references and a description of the type of experiment using predefined terms such as

‘normal vs. diseased comparison’, ‘time course’ or ‘dose response’. Experi-mental variables, defined as the quantities changing during the study, have to be specified. The grouping and connection of all hybridized arrays to the experimental variables and the corresponding extracts have to be explained.

Furthermore, quality control steps such as technical and biological replicates have to be specified.

5http://www.mged.org/Workgroups/MIAME/miame checklist.html

4.1. Standardization and Specification 53 Samples and extract procedures A sample is the biological material from which RNA, DNA or proteins used in the experiment are extracted, for example a cell culture or tissue. Further, the organism, cell type, and tissue have to be recorded. The method of RNA, DNA, or protein extraction from the sample material has to be described in detail as well as the labeling procedure preceding a hybridization.

Hybridization information For each hybridization a detailed protocol description has to be provided. This protocol should describe reagents and quantities of labeled extract used during hybridization and washing, if applicable. Also, the instruments used for hybridization, for example hybridization chambers, have to be described.

Measured data and procedures The measured data from each scan of a hy-bridized microarray include the original images from the scanner and data-files resulting from applying an image analysis software to each image. Due to the huge space requirements it is not clear if the original images have to be included in a MIAME-compliant submission. Additionally, hardware and software (e.g. microarray scanner, scanner software, and image quantification software) have to be described. This is especially important for the scanner settings. Datasets which summarize measurements of related spots (e.g. com-puting a mean value over replicates) which are used to gain the results of the study have to be provided together with a description of the method and parameters used for transformation.

Data transformation procedures As the choice of a normalization procedure may affect the outcome of the subsequent analysis, the normalization strategy, background correction and further transformations have to be specified; fur-ther, on which set of features the normalization method relies, e.g. house-keeping genes or all spots on the array. The normalization algorithm and parameters have to be documented. The MIAME specification also requires to state the control elements on the array and if external controls have been added to the labeled extract.

Although the MIAME-recommendations aim at standardizing the information content to describe a microarray experiment, it leaves space for subjective inter-pretation. For example, the provision of the original scanned images is optional due to the high storage requirements of image data, and resulting from that high costs for public repositories. On the other hand, re-evaluating the quality of image quantification and also assessing the quality of the signals is impossible without the original images. The use of freetext descriptions of experimental protocols, substances, and procedures can become another point of ambiguity. The level of detail may vary between submitters. In addition, the availability of the original reporter sequences of custom arrays might be an issue.