简介:数据存取延期成为了高端计算系统的突出的性能瓶颈。在系统设计减少数据存取延期的关键是减少数据货摊时间。存储器地区和并发是影响现代存储器系统的性能的二个必要因素。因为全面存储器系统性能上的存储器并发的影响很好没被理解,然而,存在在利用数据存取并发上在很少减少数据货摊时间学习焦点。在这研究,一双新奇数据货摊时间模型,为地区和并发的联合努力的L-C模型和为数据上的纯失误的效果的下午模型阻止时间,被介绍。模型提供数据存取延期的新理解并且为表演优化提供新方向。基于这些新模型,先进缓存优化的一张概括表格被介绍。当时,被数据并发贡献了,把38个条目仅仅,21个条目由数据地区作出贡献,它显示出数据并发的值。在这研究介绍的L-C和下午模型和他们的联系结果和机会为数据中央的建筑学和算法现代计算系统设计的未来重要、必要。
简介:Spatialapplicationswillgainhighcomplexityasthevolumeofspatialdataincreasesrapidly.Asuitabledataprocessingandcomputinginfrastructureforspatialapplicationsneedstobeestablished.Overthepastdecade,gridhasbecomeapowerfulcomputingenvironmentfordataintensiveandcomputingintensiveapplications.Integratinggridcomputingwithspatialdataprocessingtechnology,theauthorsdesignedaspatialdataprocessinggrid(calledSDPG)toaddresstherelatedproblems.RequirementsofspatialapplicationsareexaminedandthearchitectureofSDPGisdescribedinthispaper.KeytechnologiesforimplementingSDPGarediscussedwithemphasis.
简介:Withmassiveamountsofdatastoredindatabases,mininginformationandknowledgeindatabaseshasbecomeanimportantissueinrecentresearch.Researchersinmanydifferentfieldshaveshowngreatinterestindateminingandknowledgediscoveryindatabases.Severalemergingapplicationsininformationprovidingservices,suchasdatawarehousingandon-lineservicesovertheInternet,alsocallforvariousdataminingandknowledgediscoverytchniquestounderstandusedbehaviorbetter,toimprovetheserviceprovided,andtoincreasethebusinessopportunities.Inresponsetosuchademand,thisarticleistoprovideacomprehensivesurveyonthedataminingandknowledgediscorverytechniquesdevelopedrecently,andintroducesomerealapplicationsystemsaswell.Inconclusion,thisarticlealsolistssomeproblemsandchallengesforfurtherresearch.
简介:Thisresearchtakestheviewthatthemodellingoftemporaldataisafundamentalsteptowardsthesolutionofcapturingsemanticsoftime.Theproblemsinherentinthemodellingoftimearenotuniquetodatabaseprocessing.Therepresentationoftemporalknowledgeandtemporalreasoningarisesinawiderangeofotherdisciplines.Inthispaperanaccountisgivenofatechniqueformodellingthesemanticsoftemporaldataanditsassociatednormalizationmethod.ItdiscussesthetechniquesofprocessingtemporaldatabyemployingaTimeSequence(TS)datamodel.Itshowsanumberofdifferentstrategieswhichareusedtoclassifydifferentdatapropertiesoftemporaldata,anditgoesontodevelopthemodeloftemporaldataandaddressesissuesoftemporaldataapplicationdesignbyintroducingtheconceptoftemporaldatanormalisation.
简介:Page-basedsoftwareDSMsystemssufferfromfalsesharingcausedbythelargesharinggranularity,andonlysupportone-dimensionBlockorCyclicblockdatadistributionschemes,Thusapplicationsrunningonthemwillsufferfrompoordatalocalityandwillbeabletoexploitparallelismonlywhenusingalargenumberofprocessors,Inthispaper.awaytowardssupportingflexibledatadistribution(FDD)onsoftwareDSMsystemispresented.Smallgranularity-tunableblocks,thesizeofwhichcanbesetbycompilerorprogrammer,areusedtooverlaptheworkingdatasetsdistributedamongprocessors.TheFDDwasimplmentedonasoftwareDSMsystemcalledJIAJIA.ComparedwithBlock/Cyclic-blockdistributionschemesusedbymostDSMsystemsnow,experimentsshowthattheproposedwayofflexibledatadistributionismoreeffective.Theperformanceoftheapplicationsusedintheexperimentsissignificantlyimproved.
简介:Thispaperpresentsanewefficientalgorithmforclusteringcategoricaldata,Squeezer,whichcanproducehighqualityclusteringresultsandatthesametimedeservegoodscalability.TheSqueezeralgorithmreadseachtupletinsequence,eitherassigningttoanexistingcluster(initiallynone),orcreatingtasanewcluster,whichisdeterminedbythesimilaritiesbetweentandclusters.Duetoitscharacteristics,theproposedalgorithmisextremelysuitableforclusteringdatastreams,wheregivenasequenceofpoints,theobjectiveistomaintainconsistentlygoodclusteringofthesequencesofar,usingasmallamountofmemoryandtime.OutlierscanalsobehandledefficientlyanddirectlyinSqueezer.Experimentalresultsonreal-lifeandsyntheticdatasetsverifythesuperiorityofSqueezer.
简介:为越过在象文件同步那样的应用程序的宽区域网络(广域网)的文件通讯的数据deduplication并且云环境反射通常完成以数据deduplication的重要时间开销的成本节省的重要带宽。时间开销包括在二个地理上分布式的节点为数据deduplication要求的时间(例如,磁盘存取瓶颈)并且在发送者之间的复制质问/答案操作和接收装置,后来,每询问或答案介绍至少一个潜伏的双程的时间(RTT)。在这份报纸,我们在场越过有元数据反馈和元数据利用(MFMU)的广域网的一个数据deduplication系统,联系了时间开销以便利用数据deduplication。在建议MFMU系统,到发送者的从接收装置的选择元数据反馈被介绍减少复制质问/答案操作的数字。另外,到马具,元数据在接收装置联系了磁盘I/O操作,以及带宽开销由元数据反馈介绍了,磁滞现象哈希值重新组合机制基于的元数据利用部件被介绍。我们的试验性的结果证明MFMU与保存没被元数据反馈减少的比率的带宽完成了20%40%deduplication加速的一般水准,当与基线相比内容定义组合(CDC)在LBFS(Low-bandwith网络文件系统)使用,组合算法的退出的最先进的Bimodal基于数据deduplication解决方案。
简介:当前的流行系统,Hadoop和火花,当运行反复的大数据应用程序时,因为计算和通讯的低效的重叠,不能完成满足的性能。计算,数据运动,和数据管理的管道为计算系统的当前的分布式的数据起一个关键作用。在这份报纸,我们首先分析开销洗牌在Hadoop的操作并且当运用PageRank工作量时,发出火花,然后建议一条事件驱动管道和在里面记忆洗牌有更好作为DataMPI重复计算和通讯重叠的设计,一个基于MPI的图书馆,为反复的大数据计算。我们的表演评估表演DataMPI重复能为PageRank和K工具在Apache火花上在ApacheHadoop,和2X3X加速上完成9X21X加速。
简介:Inthispaper,ARMiner,adataminingtoolbasedonassociationrules,isintroduced.Beginningwiththesystemarchitecture,thecharacteristicsandfunctionsaredis-cussedindetails,includingdatatransfer,concepthierarchygeneralization,miningruleswithnegativeitemsandthere-developmentofthesystem.Anexampleofthetool'sapplicationisalsoshown.Finally,someissuesforfutureresearcharepresented.
简介:AmajoroverheadinsoftwareDSM(DistributedSharedMemory)isthecostofremotememoryaccessesnecessitatedbytheprotocolaswellasinducedbyfalsesharing.ThispaperintroducesadynamicprefetchingmethodimplementedintheJIAJIAsoftwareDSMtoreducesystemoverheadcausedbyremoteaccesses.TheprefetchingmethodrecordstheinterleavingstringofINV(invalidation)andGETP(gettingaremotepage)operationsforeachcachedpageandanalyzestheperiodicityofthestringwhenapageisinvalidatedonalockorbarrier.AprefetchingrequestisissuedafterthelockorbarrieriftheperiodicityanalysisindicatesthatGETPwillbethenextoperationinthestring.Multipleprefetchingrequestsaremergedintothesamemessageiftheyaretothesamehost,Performanceevaluationwitheightwell-acceptedbenchmarksinaclusterofsixteenPowerPCworkstationsshowsthattheprefetchingschemecansignificantlyreducethepagefaultoverheadandasaresultachievesaperformanceincreaseof15%-20%inthreebenchmarksandaround8%-10%inanotherthree.Theaverageextratrafficcausedbyuselessprefetchesisonly7%-13%intheevaluation.
简介:Approximatequeryprocessinghasemergedasanapproachtodealingwiththehugedatavolumeandcomplexqueriesintheenvironmentofdatawarehouse.Inthispaper,wepresentanovelmethodthatprovidesapproximateanswerstoOLAPqueries.Ourmethodisbasedonbuildingacompressed(approximate)datacubebyaclusteringtechniqueandusingthiscompresseddatacubetoprovideanswerstoqueriesdirectly,soitimprovestheperformanceofthequeries.WealsoprovidethealgorithmoftheOLAPqueriesandtheconfidenceintervalsofqueryresults.AnextensiveexperimentalstudywiththeOLAPcouncilbenchmarkshowstheeffectivenessandscalabilityofourcluster-basedapproachcomparedtosampling.
简介:从除一个地点驱动的模式以外的一个数据驱动的通讯模式继承,命名数据联网(NDN)把更好的支持提供给网络层dataflow。然而,应用程序开发者不得不处理复杂任务,例如数据分割,包确认,和流动控制,由于在网络层上的合适的运输层协议的缺乏。在这研究,我们设计一个dataflow面向的编程接口为NDN提供运输策略,它极大地在开发应用程序改进效率。这个接口介绍检索策略根据出版模式,基于当前的网络地位和数据产生控制dataflow在采用一个适应ADUpipelining算法的不同数据评估的二个应用程序数据单位(ADU)。接口也提供网络测量策略监视许多影响应用程序表演的批评度量标准。我们由实现流的一个录像验证我们的接口的功能和性能在世界范围的NDN试验床上跨越11个时区的申请。我们的实验证明接口罐头高效地支持开发高效、驾驶dataflow的NDN应用程序。
简介:处理的大数据正在成为数据中心计算的固执己见者部分。然而,最近的研究显示了大数据工作量不能充分利用现代记忆系统。我们发现处理的大数据的戏剧的无效从缓存失误的庞大的数量和看情况的存储器存取的货摊。在这篇论文,我们介绍二优化处理这些问题。第一是slice-and-merge策略,它减少种类过程的缓存失误率。第二优化是direct-memory-access,它改革在钥匙/值的存储使用的数据结构。这些优化被评估与微基准并且真实世界的基准HiBench。结果我们的微基准清楚地以硬件事件计数表明我们的优化的有效性;并且HiBench的另外的结果显示出1.21X一般水准加速在上申请级。两结果说明那小心的硬件/软件合作设计将改进大数据处理的存储器效率。我们的工作已经集成于为ApacheHadoop的Intel分发。
简介:Duetodramaticallyincreasinginformationpublishedinsocialnetworks,privacyissueshavegivenrisetopublicconcerns.Althoughthepresenceofdifferentialprivacyprovidesprivacyprotectionwiththeoreticalfoundations,thetrade-offbetweenprivacyanddatautilitystilldemandsfurtherimprovement.However,mostexistingstudiesdonotconsiderthequantitativeimpactoftheadversarywhenmeasuringdatautility.Inthispaper,wefirstlyproposeapersonalizeddifferentialprivacymethodbasedonsocialdistance.Then,weanalyzethemaximumdatautilitywhenusersandadversariesareblindtothestrategysetsofeachother.Weformalizeallthepayofffunctionsinthedifferentialprivacysense,whichisfollowedbytheestablishmentofastaticBayesiangame.Thetrade-offiscalculatedbyderivingtheBayesianNashequilibriumwithamodifiedreinforcementlearningalgorithm.Theproposedmethodachievesfastconvergencebyreducingthecardinalityfromnto2.Inaddition,thein-placetrade-offcanmaximizetheuser'sdatautilityiftheactionsetsoftheuserandtheadversaryarepublicwhilethestrategysetsareunrevealed.Ourextensiveexperimentsonthereal-worlddatasetprovetheproposedmodeliseffectiveandfeasible.
简介:Thispaperdescribesanimmersivesystem,called3DIVE,forinteractivevolumedatavisualizationandexplorationinsidetheCAVEvirtualenvironment.Combininginteractivevolumerenderingandvirtualrealityprovidesanaturalimmersiveenvironmentforvolumetricdatavisualization.Moreadvanceddataexplorationoperations,suchasobjectleveldatamanipulation,simulationandanalysis,aresupportedin3DIVEbyseveralnewtechniques.Inparticular,volumeprimitivesandtextureregionsareusedfortherendering,manipulation,andcollisiondetectionofvolumetricobjects;andtheregion-basedrenderingpipelineisintegratedwith3Dimagefilterstoprovideanimage-basedmechanismforinteractivetransferfunctiondesign.ThesystemhasbeenrecentlyreleasedaspublicdomainsoftwareforCAVE/ImmersaDeskusers,andiscurrentlybeingactivelyusedbyvariousscientificandbiomedicalvisualizationprojects.
简介:Withseveralricegenomeprojectsapproachingcompletiongeneprediction/findingbycomputeralgorithmshasbecomeanurgenttask.Twotestsetswereconstructedbymappingthenewlypublished28,469full-lengthKOMEricecDNAtotheRGPBACclonesequencesofOryzasativassp.japonica:asingle-genesetof550sequencesandamulti-genesetof62sequenceswith271genes.Thesedatasetswereusedtoevaluatefiveabinitiogenepredictionprograms:RiceHMM,GlimmerR,GeneMark,FGENSHandBGF.Thepredictionswerecomparedonnucleotide,exonandwholegenestructurelevelsusingcommonlyacceptedmeasuresandseveralnewmeasures.Thetestresultsshowaprogressinperformanceinchronologicalorder.Atthesametimecomplementarityoftheprogramshintsonthepossibilityoffurtherimprovementandonthefeasibilityofreachingbetterperformancebycombiningseveralgene-finders.
简介:Asusersincreasinglybefriendothersandinteractonlineviatheirsocialmediaaccounts,onlinesocialnetworks(OSNs)areexpandingrapidly.Confrontedwiththebigdatageneratedbyusers,itisimperativethatdatastoragebedistributed,scalable,andcost-efficient.Yetoneofthemostsignificantchallengesaboutthistopicisdetermininghowtominimizethecostwithoutdeterioratingsystemperformance.Althoughmanystoragesystemsusethedistributedkeyvaluestore,itcannotbedirectlyappliedtoOSNstoragesystems.Andbecauseusers'dataarehighlycorrelated,hashstorageleadstofrequentinter-servercommunications,andthehighinter-servertrafficcostsdecreasetheOSNstoragesystem'sscalability.Previousstudiesproposedconductingnetworkpartitioninganddatareplicationbasedonsocialgraphs.However,datareplicationincreasesstoragecostsandimpactstrafficcosts.Here,weconsiderhowtominimizecostsfromtheperspectiveofdatastorage,bycombiningpartitioningandreplication.Ourcost-efficientdatastorageapproachsupportsscalableOSNstoragesystems.Theproposedapproachco-locatesfrequentlyinteractiveuserstogetherbyconductingpartitioningandreplicationsimultaneouslywhilemeetingload-balancingconstraints.Extensiveexperimentsareundertakenontworeal-worldtraces,andtheresultsshowthatourapproachachieveslowercostcomparedwithstate-of-the-artapproaches.ThusweconcludethatourapproachenableseconomicandscalableOSNdatastorage.