Read full article
|

©
Cesar Rodas
The Apache Hadoop distributed computing platform will seek to bring relief to the users of the open source technology. The upgrade to Hadoop platform will make it more user friendly.
Yahoo launched HortonWorks last month to build a support and training business using Hadoop. The upgrade to Hadoop will include improvements in availability, installation and data management. According to Eric Baldeschwieler, CEO of HortonWorks, the major focus will be on addition of tools for monitoring and management along with distribution so that it is much easier for organizations to use Hadoop.
The beta upgrade is expected to come later this year for general availability and it will probably be called Hadoop 0.23. The upgrade will feature the new HCatalog data management software layer which will let users store their data in the table style which allows transparency while data movement takes place between tools. It will also feature the MapReduce programming model which will allow greater flexibility among the users.
Read full article
|

©
sermoa
Cloudera has released a third version of its open source Hadoop distro, which mimics the distributed GFS file system from Google, as well as MapReduce, the company's distributed number-crunching platform. Hadoop was open sourced at Apache, and bootstrapped by Yahoo!, but it's creator Doug Cutting now works for Cloudera. The distro tightly integrates several other Apache-licensed projects designed to run in tandem with the platform.
It includes the Hadoop distributed file system and Hadoop MapReduce, as well as Hive (an SQL-like query language developed at Facebook), Pig (a lower-level language developed by Yahoo!), Hbase ( a distributed database), Squoop (a MySQL connector built by Cloudera), Flume (a data-loading infrastructure from Cloudera), Oozie (a workflow system), and Hue (a GUI). Together they provide a complete solution for running Hadoop within an organization.
Read full article
|

©
Matt McAlister
Hadoop is the answer to the challenge presented by the availability of almost unlimited storage space for data, and emergence of very complex data. Hadoop is the best way to process the complex data which is stored in complex formats. Data-parallel processing has become a lot easier with Hadoop, and Hadoop is easily a lot more efficient that other similar systems.
Although Hadoop is the best bet for effective data-parallel processing, the cost of initial investment and Hadoop's inability to eliminate all data processing problem are further challenges at Hadoop must overcome. While Hadoop has made it a hundred times simpler to access and manage huge chunks of data, the Hadoop technology has still got a long way to go.