Open Source Hadoop meets SQL

Ravi Kolhe | 12/21/2013 | | |
Hadoop is an effective medium to store and analyze enterprise data. In this batch oriented system, it takes several minutes to get out of data from Hadoop. It makes data handling very challenging as it is quite frustrating to wait for fifteen minutes to solve a simple query. Most of the time, you'll want to give up the query before you get your reply. This led to the emergence of SQL on Hadoop. There are plenty of tools that can be used for enabling SQL on Hadoop. Let us discuss about these tools in brief.

Hive – Big data projects are divided into small Hadoop clusters to manage the information efficiently. Now the question is how to access information from Hadoop clusters? Hive is especially designed to run queries against massive amount of data. Now information can be accessed within seconds and you don’t have to wait for hours. But Hive is not suitable for real time queries. To execute real time queries Google has introduced next generation tool “Apache Drill”.

Stinger – Stinger is an initiative by Microsoft to handle Hive capabilities and to improve the performance of SQL queries running over Hive framework. With stringer, SQL queries can be executed 100 times faster than current queries. It will also support better alignment with ANSI SQL and sub queries.

These tools are usually divided into two categories by Hadoop – SQL natively on Hadoop and database on Hadoop ……

SQL natively on Hadoop – The tools can work with any file format but they dot provide any support for sub queries and better alignment with ANSI SQL. Hadoop developers Germany has to learn storage structure of data to work with these tools. Example - Hive Database on SQL – The tools can work specific file format only but they provide excellent support for sub queries plus better alignment with ANSI SQL. There is no need to learn storage structure of data by Hadoop developers to use these tools. Example – Stinger In the past few months, more and more users have realized the importance of SQL-in Hadoop solutions and this is the reason why big projects orbit around Open source Hadoop.

About the Author:
Sandeep works as a technical architect with Aegisisc, and has exceptional technical writing capability. He shares his knowledge through blogging and is heading a team of Hadoop Developers in Germany for Aegisisc.

If you enjoy this post, do us a favor: Share it!

No comments:

Subscribe to Get Free Tech Tips And Quality Tutorials Straight in Your Inbox.


We Hate Spam! Really, It's terrible and we never do it.