Bilevel Architecture for High—Thronghput Computing

(整期优先)网络出版时间:2001-01-11
/ 1
Wehaveprototypedandanalyzeddesignofanovelapproachforthehighthroughputcomputing-acoreelementfortheemergingHENPcomputationalgrid.IndependenteventprocessinginHENPiswellsutedforcomputinginparallel.Theprototypefacilitatedsuseofinexpensivemass-marketcomponentsbypovidingfaulttolerantresilienece(insteadoftheexpensivetotalsystemreliablity)viahighlyscalablemanagementcomponents.TheabilitytohandlebothhardwareandsoftwarefailuresonalargededicatedHENPfacilitylimitstheneedforuserintervention.ArobustdatamanagementisespeciallyimportantinHENPcomputingsincelargedata-flowsoccurbeforeand/oratfereachprocessingtask.Thearchitectureofouractiveobjectobjectcoordinationschemaimplementsamulti-levelhierarchicalagentmodel,Itprovidesfaulttolerancebysplittingalargeoveralltaskintoindependentatomicprocesses,performedbylowerlevelagentssynchronizingeachotherviaalocaldatabase.Necessarycontrolfunctionperformedbyhigherlevelagentsinteractwiththesamedatabasethusmanagingdistributeddataproduction.ThesystemhasbeentestedinproductionenvironmentforsimulationsintheSTARexperimentatRHIC.Ourarchitecturalprototypecontrolledprocessesonmorethanahundredprocessorsatatimeandhasrunforextendedperiodsoftime.Twentyterabytesofsimulateddatahavabeenproduced.ThegenericnatureofourtwolevelarchitecturalsolutionfaulttoleranceindistributedenvironmenthasbeendemonstratedbyistsuccessfultestforthegridfilereplicationservicesbetweenBNLandLBNL.