Data Engineering
Hive, RDMBS, Hbase, HDFS 개념잡기
혜성 Hyesung
2020. 10. 19. 23:00
반응형
Posting 목적
Hive, RDBMS, HBASE, HDFS 등.. 공부하다 보면 각 시스템들의 특징과 차이점이 명확히 구분히 안가고 혼동되는 경우가 잦았습니다. 이에 이번 기회에 정리하게 되었습니다.
Hive vs RDBMS
출처 : stackoverrun.com/ko/q/1751170
요약하면, 다음과 같다.
Hive | RDMBS | |
Is Database | No. Its called Data Warehouse | Yes |
SQL | No, but SQL-Like query data stored in various databases - you can use sqoop to import data from RDBMS to Hive |
Yes |
OLTP/OLAP | OLAP | OLTP |
Record Level manipulation | No | Yes |
Update, Delete | - Its mainly focused on analysis and processing big data in batch process - only create once and read many times |
update, delete, create, read many times |
- Work on MapReduce (Distributed Environment) - Currently work on Tez (in-memory) |
Not distributed environment |
Hbase vs RDBMS
출처 : 여기 저기서 공부한 내용을 하나로 정리해봤습니다.
Hbase | RDBMS | |
scalability | scale out (고효율) | scale up (저효율) |
schema | weak schema | strong schema |
transaction | CID (no transactional) | ACID(Atomicity, Consistency, Isolation and Durability), transactional |
SQL | X | O |
Data | structural, non-structural and semi-structural data | only structural data |
Database | Column oriented NoSQL | Row oriented relational database |
Key (Both Good at Random Access, but not ) | Row key | Primary Key |
OLAP/OLTP | OLAP (Column oriented) | OLTP (row orientd) |
When to use | - 고성능 랜덤, 실시간 일기/쓰기 - 빅데이터 - scalabilty가 필요할 때 (hadoop hdfs위에 작동하는 것이기 때문에 hdfs의 고가용성을 그대로 사용가능) |
- simple record level CRUD - 상대적으로 작은 사이즈 데이터 - when you need transaction support (데이터 정합성, 일관성 등 요구될 때) |
Hbase vs HDFS
HDFS | Hbase |
distributed File System stroing large files on disk | Database built on top of HDFS |
doesn't support fast individual record look up | HBase provides fast lookups for larger tables. |
It provides high latency batch processing; no concept of batch processing. | It provides low latency access to single rows from billions of records Randomaccess. |
It provides only sequential access of data. | HBase internally uses Hash tables and provides random access, and it stores the data in indexed HDFS files for faster lookups. |
반응형