简介:Thisarticleisaboutapieceofmiddleware,allowingtoconvertadumptapebasedTertiaryStorageSystemintoamultipetabyterandomaccessdevicewiththousandsofchannels.Usingtypicalcachingmechanisms,thesoftwareoptimizestheaccesstotheunderlyingStorageSystemandmakesbetteruseofpossiblyexpensivedrivesandrobotsorallowstointegratecheapandslowdeviceswithoutintroducingunacceptableperformancedegadation.Inaddition,usingthestandardNFS2protocol,thedCacheprovidesauniqueviewintothestoragerepository,hidingthephysicallocationofthefiledata,cachedortapeonly.BulkdatatransferissupportedthroughthekerberizedFTPprotocolandaC-API,providingtheposixfileaccesssemantics,Datasetstaginganddiskspacemanagementisperformedinvisiblytothedataclients.TheprojectisaDESY,FermilabjointefforttoovercomelimitationsintheusageoftertiarystorageresourcescommontomanyHEPlabs.ThedistributedcachenodesmayrangefromhighperformanceSGImachinestocommodityCERNLinux-IDElikefileservermodels.Differentcachenodesareassumedtohavedifferentaffinitiestoparticularstoragegroupsorfilesets.AffinitiesmaybedefinedmanuallyorarecalculatedbythedCachebasedontopologyconsiderations.Cachenodesmayhavedifferentdiskspacemanagementpoliciestomatchthelargevarietyofapplicationsfromrawdatatouseranalysisdatapools.
简介:Mostoftheearlierworkonclusteringmainlyfocusedonnumericdatawhoseinherentgeometricpropertiescanbeexploitedtonaturallydefinedistancefunctionsbetweendatapoints.However,dataminingapplicationsfrequentlyinvolvemanydatasetsthatalsoconsistsofmixednumericandcategoricalattributes.Inthispaperwepresentaclusteringalgorithmwhichisbasedonthek-meansalgorithm.Thealgorithmclustersobjectswithnumericandcategoricalattributesinawaysimilartok-means.Theobjectsimilaritymeasureisderivedfrombothnumericandcategoricalattributes.Whenappliedtonumericdata,thealgorithmisidenticaltothek-means.Themainresultofthispaperistoprovideamethodtoupdatethe'clustercenters'ofclusteringobjectsdescribedbymixednumericandcategoricalattributesintheclusteringprocesstominimisetheclusteringcostfunction.Theclusteringperformanceofthealgorithmisdemonstratedwiththetwowellknowndatasets,namelycreditapprovalandabalonedatabases.
简介:Networktrafficclassificationaimsatidentifyingtheapplicationtypesofnetworkpackets.ItisimportantforInternetserviceproviders(ISPs)tomanagebandwidthresourcesandensurethequalityofservicefordifferentnetworkapplications.However,mostclassificationtechniquesusingmachinelearningonlyfocusonhighflowaccuracyandignorebyteaccuracy.TheclassifierwouldobtainlowclassificationperformanceforelephantflowsastheimbalancebetweenelephantflowsandmiceflowsonInternet.Theelephantflows,however,consumemuchmorebandwidththanmiceflows.Whentheclassifierisdeployedfortrafficpolicing,thenetworkmanagementsystemcannotpenalizeelephantflowsandavoidnetworkcongestioneffectively.Thisarticleexploresthefactorsrelatedtolowbyteaccuracy,andsecondly,itpresentsanewtrafficclassificationmethodtoimprovebyteaccuracyattheaidofdatacleaning.Experimentsarecarriedoutonthreegroupsofreal-worldtrafficdatasets,andthemethodiscomparedwithexistingworkontheperformanceofimprovingbyteaccuracy.Experimentshowsthatbyteaccuracyincreasedbyabout22.31%onaverage.Themethodoutperformstheexistingoneinmostcases.
简介:Itisknownthatconditionalindependenceisaquitebasicassumptioninmanyfieldsofstatistics.Howtotestitsvalidityisofgreatimportanceandhasbeenextensivelystudiedbytheliterature.Nevertheless,alloftheexistingmethodsfocusonthecasethatdataarefullyobserved,butnoneofthemseemshavingtakenintoaccountofthescenariowhenmissingdataarepresent.Motivatedbythis,thispaperdevelopstwotestingstatisticstohandlesuchasituationrelyingontheideaofinverseprobabilityweightedandaugmentedinverseprobabilityweightedtechniques.Theasymptoticdistributionsoftheproposedstatisticsarealsoderivedunderthenullhypothesis.Thesimulationstudiesindicatethatbothtestingstatisticsperformwellintermsofsizeandpower.
简介:TheALICEdetectoratLHC(CERN),willrecordrawdataatarateof1.2Gigabytespersecond.TryingtoanalyseallthisdataatCRNwillnotbefeasible.AsoriginallyproposedbytheMONARCproject,dtacollectedatCERNwillbetransferredtoremotecentrestousetheircomputinginfrastructure,Theremotecentreswillreconstructandanalysetheevents.andmakeavailabletheresults.Thereforehigh-ratedatatransferbetweencomputingcentres(Tiers)willbecomeofparamountimportance.ThispaperwillpresentseveralteststhathavebeenmadebetweenCERNandremotecentresinPadova(Italy),Torino(Italy),Catania(Italy),Lyon(France),Ohio(UnitedStates),Warsaw(Poland)andCalcutta(India),Thesetestsconsisted,inafirststage,ofsendingrawdatafromCERNtotheremotecentresandback,usingaftpmethodthatallowsconnectionsofseveralstreamsatthesametime.Thankstothesemultiplestreams,itispossilbletoincreasetherateatwhichthedataistransferred.Whileseveral"multiplestreamftpsolutions"alreadyexist,ourmethodisbasedonaparallelsocketimplementationwhichallows,besidesfiles,alsoobjects(oranylargemessage)tobesendinparallel.Aprototypewillbepresentedabletomanagedifferenttransfers.Thisisthefirststepofasystemtobeimplementedthatwillbeabletotakecareoftheconnectionswiththeremotecentrestoexchangedataandmonitorthestatusofthetransfer.
简介:Distribution,interoperability,interactivity,componentarefourmainfeaturesofdistributedGIS.Basedontheprincipleofhypermap,hypermediaanddistributeddatabase,thepapercomesupwithakindofdistributedspatialdatamodelwhichisinaccordancewiththosefeaturesofdistributedGIS.Themodeltakescatalogserviceastheoutlineofspatialinformationglobalization,anddefinesdatastructureofhypermapnodeindifferentlevel.Basedonthemodel,itisfeasibletomanageandprocessdistributedspatialinformation,andintegratemulti_source,heterogeneousspatialdataintoaframework.Traditionally,toretrieveandaccessspatialdataviaInternetisonlybythemeormapname.Withtheconceptofthemodel,itispossibletoretrieve,load,andlinkspatialdatabyvector_basedgraphicsontheInternet.
简介:Timelyandcost-efficientmulti-hopdatadeliveryamongvehiclesisessentialforvehicularad-hocnetworks(VANETs),andvariousroutingprotocolsareenvisionedforinfrastructure-lessvehicle-to-vehicle(V2V)communications.Generally,whenapacket(oraduplicate)isdeliveredoutoftheroutingpath,itwillbedropped.However,weobservethatthesepackets(orduplicates)mayalsobedeliveredmuchfasterthanthepacketsdeliveredalongtheoriginalroutingpath.Inthispaper,weproposeanoveltreebasedroutingscheme(TBRS)forultilizingthedroppedpacketsinVANETs.InTBRS,thepacketisdeliveredalongaroutingtreewiththedestinationasitsroot.Andwhenthepacketisdeliveredoutitsroutingtree,itwon'tbedroptimmediatelyandwillbedeliveredforawhileifitcanarriveatanotherbranchofthetree.WeconducttheextensivesimulationstoevaluatetheperformanceofTBRSbasedontheroadmapofarealcitycollectedfromGoogleEarth.ThesimulationresultsshowthatTBRScanoutperformtheexistingprotocols,especiallywhenthenetworkresourcesarelimited.
简介:ATLAS[1]hasrecentlyjoinedGaudi,anopenprojecttodevelopadataprocessingframeworkforHEPexperiments[2],ThedatamodelisoneoftheareaswhereATLAShasextendedmoretheoriginalGaudidesigntomeettheexperiment'sownrequirments.ThispaperdescribesStoreGate,thefirstimplementationoftheATLASDataModel.
简介:高压缩比率,高译码性能,和进步数据传播是为WebGIS的向量数据压缩算法的最重要的要求。满足这些要求,我们在场一条新压缩途径。这篇论文由把漂流坐标变换成整数坐标以多尺度的数据的产生开始。在屏幕上的变换的点和原来的点之间的距离在2个象素以内,这被证明,因此,我们的途径对顾客方面上的向量数据的可视化合适。整数坐标被传递给一个整数小浪变压器,并且高周波的系数由变压器生产了被正规哈夫曼代码编码。河数据和道路数据上的试验性的结果表明建议途径的有效性:为河数据的压缩比率罐头活动范围10%和20%为道路数据分别地。我们断定更多的注意需要被付到在包含一些点的弯曲之间的关联。
简介:它仅仅是能在数据被存储的真实世界的可见部分。为如此的不完全、组织病的数据,结晶的数据瞄准atpresenting在包括unobservable事件的事件之中的隐藏的结构。,这被数据结晶化认识到哑巴项目,相应于unobservable事件的潜在的存在,被插入到给定的数据。有可见事件的这些哑巴项目和他们的关系被applyingKeyGraph与哑巴项目设想到数据,象灰尘涉及水分子的结晶化的形成的雪的结晶化一样。为调节要设想的结构的颗粒度水平,数据结晶化的工具与人在真实世界上理解重要情形的过程是综合的。这个基本方法被期望为机会发现的以前的方法带人到成功的决策的各种各样的真实世界领域适用。在这篇论文,我们在一个真实公司与human-interactiveannealing(DCHA)把数据结晶化用于产品的设计。结果显示出它的效果到工业决策。
简介:Thisresearchtakestheviewthatthemodellingoftemporaldataisafundamentalsteptowardsthesolutionofcapturingsemanticsoftime.Theproblemsinherentinthemodellingoftimearenotuniquetodatabaseprocessing.Therepresentationoftemporalknowledgeandtemporalreasoningarisesinawiderangeofotherdisciplines.Inthispaperanaccountisgivenofatechniqueformodellingthesemanticsoftemporaldataanditsassociatednormalizationmethod.ItdiscussesthetechniquesofprocessingtemporaldatabyemployingaTimeSequence(TS)datamodel.Itshowsanumberofdifferentstrategieswhichareusedtoclassifydifferentdatapropertiesoftemporaldata,anditgoesontodevelopthemodeloftemporaldataandaddressesissuesoftemporaldataapplicationdesignbyintroducingtheconceptoftemporaldatanormalisation.
简介:GeographicalInformationSystem(GIS)iswidelyusedinmanyfields.Withtherapiddevelopmentofcomputernetwork,GISuserscaremoreaboutdatasharinginnetworks.Intraditionalrelationaldatabase,dataconsistencywascontrolledbyconsistencycontrolmechanismwhenadataobjectislockedinasharingmode,othertransactionscanonlyreadit,butcannotupdateit.Thisisappropriateintraditionalrelationaldatabasesthatstoreattributedataandmainlydealwithshorttransactions.Inspatialdatabases,becauseofvastamountofdataandcomplextopologicalrelations,longtransactionaremetfrequently.Ifthetraditionalconsistencycontrolmethodhasbeenusedyet,thesystem'sconcurrencywillbebadlyinfluenced.SotherecomemanynewrequirementsfortheconsistencycontrolinthefieldofGIS.Therearemanyaspectsofdataconsistencyproblemsinspatial databases,suchastheinconsistencybetweenattributeandgeometrydata;theinconsistencyoftopologicalrelationsaftergeometryobjectshasbeenmodified.Inthispaper,othertwocasesofdataconsistencyarediscussedinMulti_userGeographicalInformationSystem. InGIS,therearemanyformsofdata,suchasgeometrydata,attribute,imagedata,andDEMdata.Inthispaper,weonlydiscussspatialgeometrydata.
简介:ARGO-YBJ,aChinese-ItalianCollaboration,isgoingtofinishthefirststepoftheinstallationofthiscosmicraytelescopeconsistinginasinglelayerofRPCs,placedat4300m.elevation,inTibet,Thedetectorwillprovideadetailedspace-timepictureoftheshowersfront,initiatedbyprimariesofenergiesintherange10GeV-500TeV.Thedatatakingwillstartatthebeginningof2002withafractionofthedetectorinstalled.willbeupgradedtwotimes,beingcompletedattheendof2003,Inthispaperwebrieflydescribethedataflow,thetriggerorganization,thethreeoperationalstepsindatatakingandthecomputingmodeltoprocessthedata.theneedofremotemonitoringoftheexperimentwillbetouchedupon.TheprocessingpowerfortherawdatareconstructionandfortheMonteCarlosimulationisreported.
简介:Receiveroperatingcharacteristic(ROC)curvesareoftenusedtostudythetwosampleprobleminmedicalstudies.However,mostdatainmedicalstudiesarecensored.UsuallyanaturalestimatorisbasedontheKaplan-Meierestimator.InthispaperweproposeasmoothedestimatorbasedonkerneltechniquesfortheROCcurvewithcensoreddata.Thelargesamplepropertiesofthesmoothedestimatorareestablished.Moreover,deficiencyisconsideredinordertocomparetheproposedsmoothedestimatoroftheROCcurvewiththeempiricalonebasedonKaplan-Meierestimator.ItisshownthatthesmoothedestimatoroutperformsthedirectempiricalestimatorbasedontheKaplan-Meierestimatorunderthecriterionofdeficiency.Asimulationstudyisalsoconductedandarealdataisanalyzed.
简介:Thepurposeofthepresentpaperistocallforattentiontothefollowingquestion:Whichoftheinitialdata(nonsmall)admitglobalsmoothsolutionstotheCauchyproblemfornonlinearwaveequations.Afewcasesandexamplesaresketched,showingthatthegeneralanswerofthisquestionmaybequitecomplicated.
简介:Thispaperpresentsamethodologytodeterminethreedataquality(DQ)riskcharacteristics:accuracy,comprehensivenessandnonmembership.Themethodologyprovidesasetofquantitativemodelstoconfirmtheinformationqualityrisksforthedatabaseofthegeographicalinformationsystem(GIS).FourquantitativemeasuresareintroducedtoexaminehowthequalityrisksofsourceinformationaffectthequalityofinformationoutputsproducedusingtherelationalalgebraoperationsSelection,Projection,and...