Monday 17 August 2015

Difference between native and hive mode in Informatica big data edition (BDE)

Difference between native and hive modes in Informatica big data edition BDE : 

 

a) Native mode- In native mode BDE works like a normal power center. This can be used to read /wrtie to traditional RDBMS databases. It can also be used to write to HDFS and Hive. It works like a power center because the execution of the mapping logic happens on the power center server. i.e. that source data is read to informatica server , transformations applied and then data is loaded to the target.

This mode is stateful i.e you can keep track of data from previous records, use sequence generators, sorters , etc just like in normal power center.


b) Hive mode- In Hive mode , like in native you can have similar source and targets however the whole mapping logic is pushed down to hive i.e. the hadoop cluster. The Informatica BDE in this mode coverts the mapping logic into hive SQL queries and executes it directly on the hadoop cluster as Hive queries there by converting them all into map reduce jobs.

This mode is not stateful i.e., you cannot keep track of dataa in the previous records using stateful variables. Your transformations like sorters, sequence generators wont work fully or properly.

Your update strategy transformation will not work in hive mode just because hive does not allow updates. You can only insert records to Hive database. 

In this mode the data gets read from source to temporary hive tables , transformed , and the target also gets loaded to temp hive tables before being inserted to final target which can be RDBMS database like oracle or Hive database. Hence the limition of hive also follows on to Hive mode in Informatica BDE. 

However. if your volume of data is huge and you want to push all the processing to hive then Hive mode is a better option. There are workarounds to  do type 2 kind of updates in Hive mode.

No comments:

Post a Comment