New Distros Like GFS File System

Read full article | No Comments

New Distros Like GFS File System
© sermoa

Cloudera has released a third version of its open source Hadoop distro, which mimics the distributed GFS file system from Google, as well as MapReduce, the company's distributed number-crunching platform. Hadoop was open sourced at Apache, and bootstrapped by Yahoo!, but it's creator Doug Cutting now works for Cloudera. The distro tightly integrates several other Apache-licensed projects designed to run in tandem with the platform.

It includes the Hadoop distributed file system and Hadoop MapReduce, as well as Hive (an SQL-like query language developed at Facebook), Pig (a lower-level language developed by Yahoo!), Hbase ( a distributed database), Squoop (a MySQL connector built by Cloudera), Flume (a data-loading infrastructure from Cloudera), Oozie (a workflow system), and Hue (a GUI). Together they provide a complete solution for running Hadoop within an organization.

Hadoop: The Best Bet for Efficient Data-Parallel Processing

Read full article | No Comments

Hadoop: The Best Bet for Efficient Data-Parallel Processing
© Matt McAlister

Hadoop is the answer to the challenge presented by the availability of almost unlimited storage space for data, and emergence of very complex data. Hadoop is the best way to process the complex data which is stored in complex formats. Data-parallel processing has become a lot easier with Hadoop, and Hadoop is easily a lot more efficient that other similar systems.
Although Hadoop is the best bet for effective data-parallel processing, the cost of initial investment and Hadoop's inability to eliminate all data processing problem are further challenges at Hadoop must overcome. While Hadoop has made it a hundred times simpler to access and manage huge chunks of data, the Hadoop technology has still got a long way to go.

Hadhoop

Read full article | No Comments

Hadhoop
© s_w_ellis

Hadhoop is an open-source framework for processing large amount of data. Informatica is now a part of progressing class of vendors working towards supporting Hadoop. The 9.1 version of Informatica involves a connector to the Hadoop file system, which enables the clients to move data in and out of Hadoop clusters. James Markarian, executive vice president of Informatica said that "Though the project is being led by yahoo, it has its roots in Web companies."
James Kobielus was of the opinion that, it is an initial stage for Hadoop and Informatica has the ability to load and retrieve data from Hadoop clusters, which isn't much different from the amount of data warehousing vendors have. In short, effective use of Hadoop is not about one tool, according to Kobielus.