MechanismsofAgeingandDevelopment151(2015)26–30
ContentslistsavailableatScienceDirect
Mechanisms of Ageing and Development
j o ur na l h o me p a g e:w w w . e l s e v i e r . c o m / l o c a te / m e c h a g e d e v
The MARK-AGE phenotypic database: Structure and strategy
María Moreno-Villanueva
a,∗,1, Tobias Kötter
b,1, Thilo Sindlinger
a, Jennifer Baur
a, Sebastian Oehlke
a, Alexander Bürkle
a, Michael R. Berthold
baMolecularToxicologyGroup,DepartmentofBiology,UniversityofKonstanz,78457Konstanz,Germany
bChairforBioinformaticsandInformationMining,UniversityofKonstanz,Konstanz,Germany
a r t i c l e i n f o
Articlehistory:
Availableonline27March2015
Keywords:
Database Datamanagement
a b s t r a c t
InthecontextoftheMARK-AGEstudy,anthropometric,clinicalandsocialdataaswellassamples ofvenousblood,buccalmucosalcellsandurineweresystematicallycollectedfrom3337volunteers.
Informationfromabout500standardisedquestionsandabout500analysedbiomarkersneededtobe documentedperindividual.Ontheonehandhandlingwithsuchavastamountofdatanecessitatesthe useofappropriateinformaticstoolsandtheestablishmentofadatabase.Ontheotherhandpersonal informationonsubjectsobtainedasaresultofsuchstudieshas,ofcourse,tobekeptconfidential,and thereforetheinvestigatorsmustensurethatthesubjects’anonymitywillbemaintained.Suchsecrecy obligationimpliesawell-designedandsecuresystemfordatastorage.Inordertofulfilthedemandsof theMARK-AGEstudyweestablishedaphenotypicdatabaseforstoringinformationonthestudysubjects byusingadoublycodedsystem.
©2015PublishedbyElsevierIrelandLtd.
1. Introduction
In general a database serves as a data storage device that allowsdataprocessesandanalysis.Therefore,todesignadetailed datamodelofadatabasecanbebeneficial(Teoreyetal.,2009).
Adatabase,which retainsdemographic, medical andbioanalyt- icaldata obtainedfromclinicaland/or observationalstudies, is animportantsourceofvaluableinformationforfurtherresearch.
Therefore results of human studies should be made machine- readablefordataanalysisanddataminingandbecomeavailableto otherscientists,thuspromotingtransparency.Relationaldatabases areacommonmethodforstoringrepetitivedata.Apracticalexpla- nation of relational database design has been reported before (Wesley,2000).
MARK-AGEwasapopulationstudythatcomprised3337sub- jectsandwasconductedtoidentifyasetofbiomarkersofageing, whichwouldmeasurebiologicalagebetterthananymarkeriniso- lation.Fourgroupsofsubjectswererecruited,i.e.,(1)randomly recruited age-stratifiedindividuals fromthegeneralpopulation [RASIG], (2) subjects born from a long-living parent belonging toa family with long living sibling(s) already recruited in the frameworkoftheEC-funded“GeneticsofHealthyAgeing(GEHA) project.For geneticreasonssuchindividuals(“GEHA offspring”)
∗Correspondingauthor.Tel.:+497531884414;fax:+497531884033.
E-mailaddress:maria.moreno-villanueva@uni-konstanz.de (M.Moreno-Villanueva).
1 Theseauthorscontributedequallytothiswork.
are expected to age at a slower rate [GO], (3) spouses of GO [SGO],and (4) a small number ofpatientswith progeroidsyn- dromes(seeBürkleandco-workers,thisissue).Fromallsubjects enrolled,anthropometric,clinicalandsocialdatawerecollected inastandardisedfashionbyusingquestionnairesaskingfordemo- graphicinformation(familycomposition,maritalstatus,education, occupation,andhousingconditions),lifestyle(foodhabits,useof tobacco or alcohol, daily activities), functional status (activities ofdaily living),cognitivestatus (STROOPtest,15-picture learn- ingtest),healthstatus(presentandpastdiseases,self-perceived health,numberand typeofprescribeddrugs)and mood(ZUNG depression scale). Information of body mass index, waist and hipcircumference,bloodpressure,heartrate,lungcapacity,near vision,chairstandingtestandhandgripstrengthwerealsodoc- umented.AdditionallyMARK-AGEsubjectswereaskedtodonate bloodafterovernightfasting.Apartofwholebloodwasprocessed toobtainplasma,serumandperipheralbloodmononuclearcells (PBMC),anotherpartwassentforbloodcounts.Buccalmucosal cells and spoturine samples were also collected. Several hun- dred potential biomarkers of ageingtargeting different cellular functionshavebeenmeasuredinMARK-AGEbiologicalmaterial (Table1).InbiologicalandmedicaltermstheMARK-AGEdatabase contains measurements of classical clinical chemistry parame- tersand biomarkers giving information about immune system, DNAdamageinthenucleargenome,accumulationofsomecova- lentlymodifiedproteinsandaccumulationofoxidativedamagein macromolecules.Allthesehavebeenconsideredascausativeof cellularageing.
http://dx.doi.org/10.1016/j.mad.2015.03.005 0047-6374/©2015PublishedbyElsevierIrelandLtd.
Erschienen in: Mechanisms of Ageing and Development ; 151 (2015). - S. 26-30 https://dx.doi.org/10.1016/j.mad.2015.03.005
Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-0-312506
Table1
MARK-AGEmetatablecontainingthenecessaryinformationforinterpretingandanalysingdata.
Parameterdescription
Totalofdebrisparticlespermillilitre Numberofviablecellspermillilitre
%Viablecells
Antipoly(ADP-ribose)antibodystimulatedPARPfluorescenceintensity Antipoly(ADP-ribose)antibodybasalfluorescenceintensity
NumberofcellscountedforFADUanalysis Poly(ADP-ribose)polymerase
InitialDNAintegrity
%DNArepair
DNAintegrityafter3.8Gy Normedconcentrationofcarotine
DNAamountinsampleasreferencefornormingconcentrations Normedconcentrationofgluthatione
NormedconcentrationofvitaminC NormedconcentrationofvitaminE Totalanalysedspots
Telomerelength
%Telomeresshorterthan3Kb IgGantibodiesspecificforinfluenzaA IgGantibodiesspecificforinfluenzaB IgGantibodiesspecificformeasles IgGantibodiesspecificfortetanustoxoid
NumberofTcellsproducingINF-gafterstimulationwithinfluenzaA NumberofTcellsproducingINF-gafterstimulationwithinfluenzaB NumberofTcellsproducingINF-gafterstimulationwithmeasles NumberofTcellsproducingINF-gafterstimulationwithtetanustoxoid NumberofTcellsproducingINF-gafterstimulationwithCMV Plasmacopper
Coppertozincratio
Plasmacopperelutingwithretentiontimeofceruloplasmin(%oftotalcopperpeaks) Plasmacopperelutingwithretentiontimeofceruloplasmin(absolutevalueinppb) Plasmacoppernotelutingwithretentiontimeofceruloplasmin(%oftotalcopperpeaks) Plasmacopperelutingwithretentiontimeofceruloplasmin(absolutevalueinppb) Plasmairon
Plasmaironelutingwithretentiontimeofalbumin(%oftotalironpeaks) Plasmaironelutingwithretentiontimeofalbumin(absolutevalueinppb) Plasmaironelutingwithretentiontimeoftransferrin(%oftotalironpeaks) Plasmaironelutingwithretentiontimeoftransferrin(absolutevalueinppb)
Backgroundfluorescence(MFI)ofsecondaryantibodyinlymphocytesinducedby50uMZn Backgroundfluorescence(MFI)ofsecondaryantibodyinmonocytesinducedby50uMZn Backgroundfluorescence(MFI)ofsecondaryantibodyinPBMCsinducedby50uMZn Metallothionein(MFI)inlymphocytesinducedby50uMZn
MetallothioneininductioninlymphocytesnormalizedforblanksignalfromMFIdata Metallothionein(MFI)inmonocytesinducedby50uMZn
MetallothioneininductioninmonocytesnormalizedforblanksignalfromMFIdata Metallothionein(MFI)inPBMCsinducedby50uMZn
MetallothioneininductioninPBMCnormalizedforblanksignalfromMFIdata
Metallothioneins(MFI)inlymphocytesinducedby24htreatmentwithZn50uMwithoutsubtractionofbackground Metallothioneins(MFI)inmonocytesinducedby24htreatmentwithZn50uMwithoutsubtractionofbackground
In orderto maximisethevalue of thedatacollectedwithin MARK-AGEwehaveestablishedacomprehensivedatabase,which facilitates data storage, sharing and analysing. The MARK-AGE databaseallowsforaccessandusedatabyConsortiumresearches andlateronbyalsoexternalscientists.
2. Materialandmethods
TheMARK-AGEdatabasewasestablishedaccordingtoEuropean standards.
2.1. Volunteerprivacyandconfidentiality
Individualsubjectmedicalinformationobtainedasaresultof thisstudyisconsideredconfidential andanydisclosuretothird partiesisprohibited.Onalldocumentssubjectsmustbeidentified onlybyanumericalcode,andneverbytheirnames.Techniciansin chargeofprocessingthesamplesdidnothaveaccesstothecentral databaselocatedinanothercity.Byimplementingthisprocedure theprivacyandconfidentialityoftheuploadeddataisguaranteed.
2.2. SQL-database
The StructuredQuery Language (SQL)is a powerful tool for interactingwithrelationaldatabasesystems.SQLenablesusersto performcomplicateddataanalysisusingsimplesyntaxandstruc- ture(Jamison,2003).Thedatabaseconsistsofseveralmanagement, metaanddatatables.Thedatatablesrepresentthevarioussections ofthequestionnairesaswellastheuploadedanalyticalresults.The managementtablestorestheuserinformationaswellasanindi- cationwhichdatahavebeenenteredforwhichsubject.Themeta tablescontainalladditionalinformationofthebioanalyticalparam- eterse.g.,shortandlongnameofagivenparameter,description andmembershiptotheWorkPackagesandMARK-AGEconsortium members(Supplementaryinformation).
2.3. Hardwareandsoftware
Selectedhardwareandsoftwarecomponents(seebelow)were suitabletoachieveanelectronicdataprocessingsystemthatmeets thedemandsofsecurityindatatransferandstorage,usabilityfor Consortiumpartnersandhighavailabilityandreliabilityofservices.
Fig.1.Databasesystem.Thedatabasesystemisahighavailability(HA)cluster.The HAclusterconsistsoftworunningservers.Ateachpointintimeonlyoneserver actsasthemaster.Inthisfigure,theserverinbuilding1isthemasterthatsupplies allnecessaryservicesfordataprocessingandstorageviaeth0LANadaptertoand fromtheinternet.Allstorageoperationstothedatabasearemirroredviaeth1LAN adapterandapoint-to-pointconnectiontotheinactiveserver(slave).Incaseof servicefailure,masterandslavewillswaprolestorestoreavailabilityofservice.
2.4. Hardwaredescription
AsthecentralhardwarecomponentthePRIMERGYTX150S6 Server(Fujitsu–Siemens)waschosen.Thisisamiddle-sizedreli- ablehardwareplatformwithapairofinternalmirroredharddick drives(raid1).Twosuchserversareusedasahighavailabilityclus- ter(Fig.1).Inahighavailabilityclustereach serversurveysthe availabilityofserviceslikewebserverorfilestorageontheother servers.Iftheactiveserverfailstosupplyaserviceanotherserver automaticallytakesovertheactivepartandcarriesontosupplyall serviceswithoutdelay.Weselectedtwolocationsintwodistant buildingsonthecampusoftheUniversityofKonstanzwithsuf- ficientnetworkconnectorsinordertoavoidrunningtheservers inthesameplace.IncooperationwiththeUniversityComputing
CentreadedicatedLANpoint-to-pointconnectionwasinstalled basedonanexistingfibre-opticcablefordataexchangebetween thetwoserversinthetwobuildings.Soevenifonelocationwere damagedseverely,e.g.,byfire,thesecondserverwouldtakeover.
2.5. Softwaredescription
InordertoguaranteeexcellentusabilityfortheMARK-AGECon- sortium membersin transferring data, it wasdecided touse a standardwebinterfacetechniqueconsistingofawebserverand ascriptinglanguageontheserverside.Therefore,apartneronly neededaPCwithinternetconnectionandaweb-browserfordata inputandtransfer.Theserversweredeliveredwiththeoperating systemDebian Linuxandstandardpackages.Thefollowing was installedorconfiguredonbothclusterservers:
•Adistributedreplicatedblockdevice(DRBD)system(togetthe storageoperationsmirrored).
•ViathededicatedLANpoint-to-pointconnection).
•Theheartbeatsystem(togettheserver-to-serverservicessur- vey).
•TherelationaldatabasesystemPostgreSQL(asthestoragelayer).
•ApacheWebserversystem(astheinputlayer).
•ModulesofserversidescriptinglanguagePHP(astheprocessing layer).
3. MARK-AGEdatabasefeatures 3.1. Datacodingsystem
ThemajortaskfortheinformationsystemswithintheMARK- AGEprojectwastoestablishservicesforentry,storageandretrieval thephenotypicdatafrom3337subjects.Ageneralrequirementin theprojectistoseparatethreetypesofdata,i.e.,subjectidentifying data,biographicaldataandbioanalyticaldata.Toachievethisitwas decidedtoconnectthefirstandlattertwotypesonlybyidentifiers butstorethemseparately.Theidentifierswecalledsubjectcodes (SC).Fig.2providesasummaryofSCanddataflow.
Personal data and results from bioanalytical measurements wereenteredintothecentraldatabaseonlyusingasubjectcode.
Fig.2. DataflowandsubjectcodeswithinMARK-AGE.Recruitmentcentresgenerateduniqueprimarysubjectcodes(PSC)forcodingofboth,electronicquestionnaire informationandbiologicalmaterial.BiobankstaffintroducedthePSCintoadatabasesystemservice,whichconverteditinarandomlygeneratedsecondarysubjectcode (SSC).Biobankstaffthenre-labelledanytubescontainingbiologicalmaterialusingtheSSC.BioanalyticallaboratoriesuploadedtheirdatabyusingtheSSC.
Codingof recruited individuals wasperformeddirectly at each recruitmentcentrebyassigningauniquealphanumericcode(pri- marysubjectcode,PSC)toeachsubject.ThePSCconsistsedof7 digits;thefirstandtheseconddigitsidentifiedtherecruitmentcen- tre;thethirdandfourthidentifiedrecruitmentphasesandthelast threedigitsarerunningnumberswithexceptionof“TRY-Phase”
coding(seebelow).Biologicalsampleswerere-codedattheMARK- AGEBiobank,byassigningasecondarysubjectcode(SSC).Thiscode wasgeneratedautomaticallyattheBiobankandwaspassedonto theCoordinating Centreonly.TheSSCconsistedof 5digits.The first4digitsweregeneratedinarandombutuniquemanner;the lastdigitwasachecksum.OnlySSC-codedbiologicalmaterialwas distributed to MARK-AGE members for bioanalytical measure- ments. Subject-related data including biographical data were enteredintothedatabasebytherecruitmentcentreswhereasbio- analyticaldatawereenteredonly usingasecondarycode; both typesofdatawereconnectedbythePSCtoSSCcodingtable.
3.2. From“TRY”to“REAL”phase
Adetailedexplanationandapracticaldemonstrationofallthe aboveproceduresincludingtheprocessingofcompletedquestion- naireswereprovidedtoallresearchers/staffinvolvedindataentry, thusminimizingtheriskofoperatorerrors.Neverthelessdatabase managementstaffwasgivenatimeperiodofthreemonthsforget- tingfamiliarwiththesystembysimulatingdataentry.Duringthis
“TRY”phaseoftheMARK-AGEprojectallactivitiesforeseenwere rehearsed,fromrecruitmenttoanalysis.Eachrecruitmentcentre sampled10volunteerswhodidnotbelongtotheMARK-AGEtar- getpopulationsinordertoverifythereliableexecutionofallthe stepsin each of thestandard operating procedures(SOPs) (see Moreno–VillanuevaandCapriandco-workers,thisissue),includ- ingdataentry.Subjectsofthe“TRY”phasewereidentifiedbya differentprimarysubjectcodetoavoidanypossibleconfusionwith
‘real’subjectstobeexaminedafterwards.ThisspecialPSChadthe followingstructure:xxTRYyy.Thetwofirstdigits“xx”identified
Fig.3.DatainputwasbasedonthreesoftwarelayersontheMARK-AGEcluster server.Thedatabasesystemwascontactedfromtheinternetbyawebbrowservia thesecurehypertexttransferprotocol(https).AfterfillingoutaHTMLformand clickingthesendbutton,thedatawerepassedtotheprocessinglayerbythepost command.ThenPHPscriptscouldinspectdataandalloworrejectinputifvalues werebeyondthresholds.Ifallowed,thedatawereinsertedintothedatabasebySQL statementsfromtheprocessinglayer.
therecruitmentcentreandtwolastdigits“yy”identifiedthesub- ject.ThisTRYphasewasextremelyusefulasitenabledustocorrect thresholdvaluesintheinputandprocessinglayer.
Theprocedureofenteringdatainthedatabasewastestedand rehearsedrepeatedlybythepartners,inordertopreventanyprob- lems duringthephase ofactive recruitment.With theonsetof the“TRY”phasethedatainputstarted.Eachrecruitmentcentre processed10“TRY-subjects”,andattheendoftheTRYphasethe informationof80“TRY-subjects”from8recruitmentcentreshad beensavedinthedatabase(50subjectsinDecember,20subjectsin Januaryand10subjectsinFebruary).Theentryofquestionnaires from‘real’subjectsstartedinMarch2009.
Thanks tothe close and very intense co-operation between allpartnersinvolvedinrecruitment,allpossiblesourcesformis- takesandincoherenciesassociated withelectronicdatastorage,
Fig.4.Formsforquestionnairedatainputfortherecruitmentcentres.PHPframeworkallowedlogin,introductionofsubjectcodeanddatainputinamultiplesessionmanner.
Thereweresixquestionnaireforms,whichcoveredvariouspartsoftheinterview.TheinterviewersusedthePHPframeworkformstotransfertheinformationcollectedin thequestionnairestothedatabase.Aftersavingdatainthedatabasethesameformcouldnotbere-selected.
performed via a secure internet link, could be identified and fixed.
3.3. Datainput
Twomethodswereavailableforuploadingbioanalyticalresults tothedatabase.Thewebformbaseddatauploadfocusedonflexi- bilityandwasbasedonthedatauploadframeworkthathadbeen developedforthequestionnaires.Thesewebforms allowedthe uploadofanalytic resultsperSSCandthus didnot supportthe uploadofmassdata,whereasthesecondfocussedonmassdata tosupportthereportingofhundredsofdatapointspersecondary subjectcode(SSC)atonce(Fig.3).Thedatafileconstraintswereas follow:
1.Thefilehadtobeprovidedasatextfile(“.txt”).
2.Singlevalueshadtobeseparatedbyataborasemicolon.
3.Thefirstrowhadtohaveadistinctcolumnname.
4.Nospecialcharacters(suchas“[,],!,/,;”etc)couldbeusedinthe columnnameswiththeexceptionof“”.
5.Columnnameshadtostartwithanalphabeticalcharacter.
6.Emptycolumnshadtobeavoided.
7.OnecolumnhadtocontaintheSSC.
8.ThenameofcolumnthatcontainedtheSSChadtobespecified intheuploaddialogue.
9.Non-numericvaluessuchas“nottested”,“notapplicable”,“n/a”,
“nd”etc.,werenot supportedwithinnumericalcolumns.The fieldshouldbeleftblank.
3.4. Questionnaires
Eachsectionofthequestionnaireswasrepresentedbyaweb pagethatwasbuildusingacustomprogrammedPHPframework (Fig.4).Theframeworkhasbeendevelopedtoprovideaneasycre- ationoftheinputforms,batchuploadingofanalyticalresults,as wellasavalidationoftheinputdata,e.g.,therangeofanumerical fieldoramandatoryinputfieldusingregularexpressions.Addi- tionalvalidationwasimplementedinthedatabasebyenforcingnot nullaswellasuniquekeyconstraintswherenecessary.Inaddition eachsetofdatacontainedatimestamprecordingthetimethedata sethadbeensavedinthedatabase.
3.5. Bioanalyticaldata
Eachdata,e.g.,theresultsfromthequestionnairesorthebio- analyticalresults,couldbeenteredonlyoncepersubjectbythe personentitled.Inordertoeditasetofdata,arecruiterneeded specialpermissionfromtheadministrator,whichwasnecessary forenablingthereviewingandpossiblyre-enteringofdataforthe requestedsubjectintothedatabase.
Thewebframeworkfurtherensuredthatuserscouldenterand reviewonlydatatheyareentitledto,e.g.,questionnairesorana- lyticaldatafromtheirownlaboratory.Therefore,eachuserhadhis ownlogin,whichiscoupledwithacertainrole,e.g.,recruitment centre,bioanalyticallaboratoryorBiobank.
3.6. Metadata
Severalhundredpotentialbiomarkersofageingtargetingdiffer- entcellularfunctionshavebeenmeasuredinMARK-AGEbiological material.In order tofacilitate dataextractiona metadata table (metastable)hasbeencreated.Metadataareusedtodescribedig- italdatain ordertoprovide relevantinformation about oneor more aspects of the data. Metadata is often called data about dataorinformationaboutinformation(GuentherandRadebaugh, 2004).MARK-AGEmetatablecontainsinformationaboutparame- terdescription,parametershortname,parametername,parameter unit,typeofbiologicalmaterial,methodusedformeasurement,cal- culationsperformed,numberofanalysedprobands,countfemale, countmale,countRASIG,countGO,countSGO,partnerID,WP number,DBtablenameandcomments(Table1).
4. Conclusion
Human studies are of very high relevance in biomedi- cal research. Therefore, the design of a study, the procedures performedandresultsobtainedneedtobemachine-readablein ordertofacilitatedataanalysisanddatamining.Furthermore,there isa growinginterestinsharingresearchdatawithinthescien- tificcommunity.However,effectivesharingofdatarequiresnot onlysharingresultsbutalsoadditionalinformationcollectedin ameta tablecontainingthenecessaryinformation forinterpre- tingandanalysingdata.WithinMARK-AGEwehaveestablished aphenotypicdatabaseabletoaccommodatealldifferenttypesof datageneratedduringtheproject.Asidefromthenecessaryinfra- structure,adouble-codesystemanddatatransferanddatasharing strategieswereimplementedinaccordancewiththespecifications ofMARK-AGEproject.
Acknowledgements
WewishtothanktheEuropeanCommissionforfinancialsup- port through the FP7large-scale integrating project“European Study to Establish Biomarkers of Human Ageing” (MARK-AGE;
grantagreementno.:200880)andallMARK-AGEConsortiumpart- nersfortheexcellentcollaboration.
AppendixA. Supplementarydata
Supplementarydataassociatedwiththisarticlecanbefound,in theonlineversion,athttp://dx.doi.org/10.1016/j.mad.2015.03.005.
References
Guenther,R.,Radebaugh,J.,2004.UnderstandingMetadataNationalInformation StandardsOrganization.NISOPress,BethesdaMD.
Jamison,D.C.,2003.Structuredquerylanguage(SQL)fundamentals.Curr.Protoc.
Bioinf.,Chapter9:Unit9.2.
Teorey,T.J.,Lightstone,S.S.,etal.,2009.DatabaseDesign:KnowItAll,1sted.
MorganKaufmannPublishers,BurlingtonMA.
Wesley,D.,2000.Relationaldatabasedesign.J.Insur.Med.32(2), 63–70.