Today I’m analyzing the properties of a 0.5TB dataset (a billion vertices in a graph) using Pig/Hadoop on Amazon’s Elastic Map Reduce service. I configured a cluster which contains the following nodes:
- 1 MASTER: c1.medium
- 9 CORE: c1.xlarge x9 (High-CPU Instance)
- 10 SPOT: c1.xlarge x10 (High-CPU Instance), bid $0.20
This cluster processed a 2.5 billion record data file (0.5TB) in about 15 (!) minutes. Very impressive!