Analyzing 0.5TB (one billion vertices) on Amazon’s EMR
Today I’m analyzing the properties of a 0.5TB dataset (a billion vertices in a graph) using Pig/Hadoop on Amazon’s Elastic Map Reduce service. I configured a cluster which contains the following nodes: 1 MASTER: c1.medium 9 CORE: c1.xlarge x9 (High-CPU … Continue reading