What is the history of HBase?
2006: BigTable paper published by Google
2006: (end of year): HBase development starts.
2008: HBase becomes Hadoop sub-project.
2010: HBase becomes Apache top-level project.
What is Apache HBase?
Apache Hbase is one the sub-project of Apache
Hadoop,which was designed for NoSql database(Hadoop Database),bigdata store and
a distributed, scalable.Use Apache HBase when you need random, realtime
read/write access to your Big Data.A table which contain billions of rows X
millions of columns -atop clusters of commodity hardware. Apache HBase is an
open-source, distributed, versioned, non-relational database modeled after
Google’s Bigtable. Apache HBase provides Bigtable-like capabilities run
on top of Hadoop and HDFS.
what is NoSql?
Apache HBase is a type of “NoSQL” database. “NoSQL”
is a general term meaning that the database isn’t an RDBMS which supports SQL
as its primary access language, but there are many types of NoSQL databases:
BerkeleyDB is an example of a local NoSQL database, whereas HBase is very much
a distributed database. Technically speaking, HBase is really more a “Data
Store” than “Data Base” because it lacks many of the features you find in an
RDBMS, such as typed columns, secondary indexes, triggers, and advanced query
languages, etc.
What are the main features of Apache HBase?
Apache HBase has many features which supports both
linear and modular scaling,HBase tables are distributed on the cluster via
regions, and regions are automatically split and re-distributed as your data
grows(Automatic sharding).HBase supports a Block Cache and Bloom Filters for
high volume query optimization(Block Cache and Bloom Filters).
When should we use Hbase?
1)we should have milions or billions of rows and
columns in table at that point only we have use Hbase otherwise better to go
RDBMS(we have use thousand of rows and columns)
2)In RDBMS should runs on single database server
but in hbase is distributed and scalable and also run on commodity hardware.
3) typed columns, secondary indexes, transactions,
advanced query languages, etc these features provided by Hbase,not by RDBMS.
What is the difference between HDFS/Hadoop and
HBase?
HDFS doesn’t provides fast lookup records in a
file,IN Hbase provides fast lookup records for large table.
Is there any difference between HBase datamodel and
RDBMS datamodel?
In Hbase,data is stored as a table(have rows and
columns) similar to RDBMS but this is not a helpful analogy. Instead, it can be
helpful to think of an HBase table as a multi-dimensional map.
What are key terms are used for designing of HBase
datamodel?
1)table(Hbase table consists of rows)
2)row(Row in hbase which contains row key and one
or more columns with value associated with them)
3)column(A column in HBase consists of a column
family and a column qualifier, which are delimited by a : (colon) character)
4)column family(having set of columns and their
values,the column families should be considered carefully during schema design)
5)column qualifier(A column qualifier is added to a
column family to provide the index for a given piece of data)
6)cell(A cell is a combination of row, column
family, and column qualifier, and contains a value and a timestamp, which
represents the value’s version)
7)timestamp( represents the time on the
RegionServer when the data was written, but you can specify a different
timestamp value when you put data into the cell)
What are datamodel operations in HBase?
1)Get(returns attributes for a specified row,Gets
are executed via HTable.get)
2)put(Put either adds new rows to a table (if the
key is new) or can update existing rows (if the key already exists). Puts are
executed via HTable.put (writeBuffer) or HTable.batch (non-writeBuffer))
3)scan(Scan allow iteration over multiple rows for
specified attributes)
4)Delete(Delete removes a row from a table. Deletes
are executed via HTable.delete)
HBase does not modify data in place, and so deletes
are handled by creating new markers called tombstones. These tombstones, along
with the dead values, are cleaned up on major compaction.
How should filters are useful in Apache HBase?
Filters In Hbase Shell,Filter Language was
introduced in APache HBase 0.92. It allows you to perform server-side filtering
when accessing HBase over Thrift or in the HBase shell.
How many filters are available in Apache HBase?
Total we have 18 filters are support to hbase.They
are:
ColumnPrefixFilter
TimestampsFilter
PageFilter
MultipleColumnPrefixFilter
FamilyFilter
ColumnPaginationFilter
SingleColumnValueFilter
RowFilter
QualifierFilter
ColumnRangeFilter
ValueFilter
PrefixFilter
SingleColumnValueExcludeFilter
ColumnCountGetFilter
InclusiveStopFilter
DependentColumnFilter
FirstKeyOnlyFilter
KeyOnlyFilter
How can we use MapReduce with HBase?
Apache MapReduce is a software framework used to
analyze large amounts of data, and is the framework used most often with Apache
Hadoop. HBase can be used as a data source, TableInputFormat, and data sink,
TableOutputFormat or MultiTableOutputFormat, for MapReduce jobs. Writing
MapReduce jobs that read or write HBase, it is advisable to subclass
TableMapper and/or TableReducer.
How do we back up my HBase cluster?
There are two broad strategies for performing HBase
backups: backing up with a full cluster shutdown, and backing up on a live
cluster. Each approach has pros and cons.
1)Full Shutdown Backup
Some environments can tolerate a periodic full
shutdown of their HBase cluster, for example if it is being used a back-end
analytic capacity and not serving front-end web-pages. The benefits are that
the NameNode/Master are RegionServers are down, so there is no chance of
missing any in-flight changes to either StoreFiles or metadata. The obvious con
is that the cluster is down.
2)Live Cluster Backup
live clusterbackup-copytable:copy table
utility could either be used to copy data from one table to another on the same
cluster, or to copy data to another table on another cluster.
live cluster backup-export:export approach dumps
the content of a table to HDFS on the same cluster.
Does HBase support SQL?
Not really. SQL-ish support for HBase via Hive is
in development, however Hive is based on MapReduce which is not generally
suitable for low-latency requests.By using Apache Phoenix can retrieve
data from hbase by using sql queries.