??Where HPC & Big Data Intersect (HPC Data Analysis Software)
Bruce Hendrickson
Computational Sciences & Math Group Sandia National Labs, Albuquerque
????
??• Whatis“BigDataAnalytics”? – SQL Queries?
– Knowledge discovery?
– Human-in-the-loop?
• Whatis“HPC”?
– Map-Reduce?
– Shared-Memory?
– Trans-petascale machines?
What is in Scope?
??
??Does Big Data Really Need HPC?
• Lotsoftalkabout“convergence”betweenbig compute and big data
– Comforting, self-serving conclusion
• Big compute generates and is needed to analyze big data • Networking and memory performance are critical to both • Etc.
• Ifthisistrue,whyhaven’twesoldlotsof supercomputers to support data analytics!?
??
??The Search for El Dorado
• Whyuseexpensivemachineswhencheapones suffice?
– Answers must be very valuable
– Response times must be fast, OR
– Analysis is complex (== not amenable to map-reduce)
• Limitednumberofpossibleconsumers – Wall Street (quants & high-speed traders) – National security community
• Limitednumberofpossibleapplications – Graph analytics?
??
??Reasons to Avoid Using HPC
• GettingdataontoanHPCplatformispainful – Must be able to amortize cost over many analyses – Or must generate data on the machine
• HPCnetworksweren’tdesignedforanalysistasks – Need support for fast injection, small messages, many
outstanding requests
• Softwareishard(akaexpensive)
– Ecosystem of HPC analysis software barely exists
– Is need persistent enough to justify development costs?
??
??HPC Data Analysis Software • Mostlynon-existent
• Onlyrealniche–analyzingdatageneratedbyHPC
– Even we wouldn’t choose to do analysis in situ if we could avoid it, but given poor bandwidth, any alternative would be even worse!
???
??Ask not what HPC can do for big data … … but ask what big data can do for HPC!
??
?????
??Backup Slides (spoken to the next day)
??
??What I *Really* Believe
Bruce Hendrickson
Computational Sciences & Math Group Sandia National Labs, Albuquerque
?????
??HPC & Big Data Analytics
• Today’sHPCplatformsarenotcost-effectivefor most big data challenges
– Over-provisioned processors, under-provisioned I/O system – Network, programming model, usage model & software
ecosystem optimized for scientific workloads
• Butthisisthe*wrong*question!
• Needsatthecomponentlevelhavestrongsynergies
– Smarter memories
– Improved power efficiency & management
– Better networks
– More flexible & productive programming models
??
??Future Opportunities
• Co-investmentinsolvingcommoncomponent problems
• Potentialleverageofeach-other’ssoftwarestacks
• Exchangeofideasandbest-practices
• Machinesbuiltoutofcommoncomponents
• EnrichingHPCvianewapproachestoparallelism
??

Categories: News