• Hadoopinterviews.com

    This platform is for those who are willing to learn and preparing him self for coming future storm in information technology field– Bigdata. Kind of attention or some people refer as hype, Big data has received from last 2-3 years is absolutely phenomenal. Lots of research claiming that the Bigdata is next milestone od IT service industry.

    Learn more
  • Adding ACID to Apache Hive

     want to provide that generalized UPDATE capability to the rest of the Hive community without creating a new SQL engine on top of Hadoop.

    HIVE-5317 - Implement insert, update, and delete in Hive with full ACID support

    Learn more

Newsletter

Categories

Popular post

Recent comments

Latest from Hadoop Interviews

29/August 2014

Comments (0)

How to configure remote metastore mode in Hive

Hive provide three ways of metastore type. Embedded, Local and remote. Details of these can be found Hive Metastore!!. In case of remote metastore, metstore services would be running in different JVM then to hive and these service would be connected to mysql/any third party database to store the schema definition. if multiple metastore services server are being used in this configuration then client would be connected to server in round robin manner.
By default hive comes with embedded metastore type rest two can be configured.

  • hive.metastore.local=false
  • hive.metastore.uris set the metastore server URIs

Note: How ever hive.10 onwards we don’t need to set hive.metastore.local only setting hive.metastore.uris id sufficient.

Learn more

28/August 2014

Comments (0)

Interaction with HDFS during Hive Table creation

Hive itself provides RDBMS like feature of creating the table structure and loading data to create tables. Though there is difference between load data in RDBMS and Hive. Hive follows Schema on Read while RDBMS follows Schema on Write. in short Hive does not validate schema at the time of data loading, it validated schema at the time of data reading.

Learn more

28/August 2014

Comments (0)

Use of overwite keyword in Hive Load Data statement

Overwrite keyword in Load data statement pass message to hive to delete existing data from file. before we will understand the Overwrite we need to understand the role and use of Load data in Hive. Load data is command which will be used to load data in hive’s existing table. As i mentioned existing table to get the better understanding of table creation one can refer.hive-table-creation
Below mentioned is a typical syntax for loading data.

LOAD DATA LOCAL INPATH ‘/home/cloudera/Desktop/Item.txt’ OVERWRITE INTO table things;

The term overwrite mentioned in this command is instructing hive to load data of item.txt in to things table. At same time it is instructing hive to delete the existing data from things table before writing fresh data of item.txt.

Learn more

22/August 2014

Comments (0)

How to check HDFS health?

Hadoop provides an untility to check the hdfs file systems health. The tool scans datanodes for all blocks and prepare a report like below mentioned detail.

  • Hadoop fsck /user/

This command will look into blocks of files of user directory. If location is not specified it will start looking file from root. Below is a sample report of this command. Most of the components of this report is self explanatory.

Learn more

21/August 2014

Comments (0)

What is block scanner?

Block scanning is process preformed by data nodes to verify the integrity of data stored in blocks. DataNode runs this process which scans all the block replicas available it and verifies that with stored checksums of data blocks. Checksums are stored in text files during block creation.When ever a new client read a complete block, client perform checksum verification process, client inform Datanode about verification

Learn more

21/August 2014

Comments (0)

What is inter cluster Data copying in hadoop?

DistCp(Distributed copy) is tool provided by Hadoop distributed file system, which provide data copying facility from source to destination. It could be inter or intra cluster. It is based on map reduce to provide distributed capabilities. At high level it take all file and directory

Learn more

12/August 2014

Comments (0)

How to deal with small file in Hadoop?

During one of my project I have encountered with this issue. I have to take log from server which was text file in KB’s. On an average one region’s log was generating thousands of file per hour. When I was running my project with sample data of 3-5 files in directory, my program was giving expected result in normal time. So I decided to give a try to run it with the one regions one hour data. This directory was having some 9000 files of 1.7 GB data. Program got stuck, and machine was hanged. During initial investigation I found out that my number of input split and Map got increased to number of files available in input directory. Then I realize that there is some issue that needs to be identified.

Learn more

Back to Top