Hyesung Oh

Hive, RDMBS, Hbase, HDFS 개념잡기 본문

Data Engineering

Hive, RDMBS, Hbase, HDFS 개념잡기

혜성 Hyesung 2020. 10. 19. 23:00
반응형

Posting 목적


Hive, RDBMS, HBASE, HDFS 등.. 공부하다 보면 각 시스템들의 특징과 차이점이 명확히 구분히 안가고 혼동되는 경우가 잦았습니다. 이에 이번 기회에 정리하게 되었습니다.

 

Hive vs RDBMS


출처 : stackoverrun.com/ko/q/1751170

요약하면, 다음과 같다.

  Hive RDMBS
Is Database No. Its called Data Warehouse Yes
SQL No, but SQL-Like query data stored in various databases
- you can use sqoop to import data from RDBMS to Hive
Yes
OLTP/OLAP OLAP OLTP
Record Level manipulation No Yes
Update, Delete - Its mainly focused on analysis and processing big data in batch process
- only create once and read many times
update, delete, create, read many times
  - Work on MapReduce (Distributed Environment)
- Currently work on Tez (in-memory)
Not distributed environment

 

Hbase vs RDBMS


출처 : 여기 저기서 공부한 내용을 하나로 정리해봤습니다.

  Hbase RDBMS
scalability scale out (고효율) scale up (저효율)
schema weak schema strong schema
transaction CID (no transactional) ACID(Atomicity, Consistency, Isolation and Durability), transactional
SQL X O
Data structural, non-structural and semi-structural data only structural data
Database  Column oriented NoSQL Row oriented relational database
Key (Both Good at Random Access, but not ) Row key Primary Key
OLAP/OLTP OLAP (Column oriented) OLTP (row orientd)
When to use - 고성능 랜덤, 실시간 일기/쓰기 
- 빅데이터
- scalabilty가 필요할 때 (hadoop hdfs위에 작동하는 것이기 때문에 hdfs의 고가용성을 그대로 사용가능) 
- simple record level CRUD
- 상대적으로 작은 사이즈 데이터
- when you need transaction support
(데이터 정합성, 일관성 등 요구될 때)

 

 

Hbase vs HDFS


HDFS Hbase
distributed File System stroing large files on disk Database built on top of HDFS
doesn't support fast individual record look up HBase provides fast lookups for larger tables.
It provides high latency batch processing; no concept of batch processing. It provides low latency access to single rows from billions of records Randomaccess.
It provides only sequential access of data. HBase internally uses Hash tables and provides random access, and it stores the data in indexed HDFS files for faster lookups.

 

반응형
Comments