Facebook dives deep on their big-data stack.

Originally published on Kapuno in the Technology community

🔗 Under the Hood: Scheduling MapReduce jobs more efficiently with Corona | Facebook (Article)

Memorable excerpt:

“Over half a petabyte of new data arrives in the warehouse every 24 hours, and ad-hoc queries, data pipelines, and custom MapReduce jobs process this raw data around the clock to generate more meaningful features and aggregations.”