HIVE

Query JDBC datasource Directly from HIVE


We see many many options to retrieve data from RDBMS and we validate the data every time if we receive it as dump. Say for a use case where the source can give you access to fetch the full data. one can use the JDBC Hive storage Handler to directly query the JDBC RDBMS data source.

Below is the URL for further Mind tweaking.

 

Official:

https://issues.apache.org/jira/browse/HIVE-1555

https://community.hortonworks.com/articles/4671/sparksql-jdbc-federation.html

 

Forks:

 

https://github.com/qubole/Hive-JDBC-Storage-Handler

 

https://github.com/myui/HiveJdbcStorageHandler

 

https://github.com/QubitProducts/hive-jdbc-storage-handler

 

Advertisements
HIVE

Basic HQL CheckList

We can start with simple one – these are general stuffs to check while writing HQL.

 

General HIVE Query:

  • Avoid providing inline queries in the where clause.
  • Make all the joins explicitly in the “From” clause instead of “where” clause.
  • FULL OUTER JOINS are very expensive in HIVE.
  • If possible generate statistics for each table before running the query.
  • Make the order of join from smaller to larger tables.
  • Any partitioned tables must have the partition range filtered before making any joins.
  • While joining smaller tables, of up to 1 GB data size, make a MAPJOIN hint for the smaller table.
  • While doing any join if we want to have certain filter as part of the join then it has to be mention as part of the from clause join statement instead of having the same on where clause.
  • While Joining Make use of the Key columns to make the Joins.
  • If a query is having multiple layers of inner queries and used in many other queries to arrive at a result, please create a physical table which can be refreshed and queried as and when needed.