Good Book: Hadoop: The Definitive Guide
Saturday, October 30, 2010
Applications and data are the yin and yang of information technology. For a long time, it seemed like there wasn't so much new going on in the data world--just more ways to use SQL and traditional relational databases.
But those days are over. Data and how we work with it are changing in all kinds of interesting ways. Much of this change (although not all of it) has been driven by the advent of cloud computing, giving us a whole new way to create and work with really big data.
Part of the foundation for this was laid in two papers published by Google describing internal technologies: one on the Google File System (GFS)
and another on a programming model for working with big data called MapReduce
. Once this knowledge went public, people began building their own implementations. The most important of these is surely the open-source Hadoop, which includes a GFS-like technology called the Hadoop Distributed Filesystem (HDFS), an implementation of MapReduce, and lots more.
Understanding a chunk of new technology that solves lots of new problems isn't always so simple. Getting a handle on Hadoop is straightforward, though, because there's a great introductory book: Hadoop: The Definitive Guide
, by Tom White. I liked this book's first edition, and the second is even better: clear, complete, and compelling.
It's a significant time in the data world, with new problems and new solutions. Hadoop is clearly part of this, and I'd encourage anybody interested in this area to read White's book. It's a great introduction.