Hive, RDMBS, Hbase, HDFS 개념잡기

Notice

Recent Posts

Tags more

Archives

관리 메뉴

Hyesung Oh

Data Engineering

혜성 Hyesung 2020. 10. 19. 23:00

Hive, RDBMS, HBASE, HDFS 등.. 공부하다 보면 각 시스템들의 특징과 차이점이 명확히 구분히 안가고 혼동되는 경우가 잦았습니다. 이에 이번 기회에 정리하게 되었습니다.

요약하면, 다음과 같다.

	Hive	RDMBS
Is Database	No. Its called Data Warehouse	Yes
SQL	No, but SQL-Like query data stored in various databases - you can use sqoop to import data from RDBMS to Hive	Yes
OLTP/OLAP	OLAP	OLTP
Record Level manipulation	No	Yes
Update, Delete	- Its mainly focused on analysis and processing big data in batch process - only create once and read many times	update, delete, create, read many times
	- Work on MapReduce (Distributed Environment) - Currently work on Tez (in-memory)	Not distributed environment

출처 : 여기 저기서 공부한 내용을 하나로 정리해봤습니다.

	Hbase	RDBMS
scalability	scale out (고효율)	scale up (저효율)
schema	weak schema	strong schema
transaction	CID (no transactional)	ACID(Atomicity, Consistency, Isolation and Durability), transactional
SQL	X	O
Data	structural, non-structural and semi-structural data	only structural data
Database	Column oriented NoSQL	Row oriented relational database
Key (Both Good at Random Access, but not )	Row key	Primary Key
OLAP/OLTP	OLAP (Column oriented)	OLTP (row orientd)
When to use	- 고성능 랜덤, 실시간 일기/쓰기 - 빅데이터 - scalabilty가 필요할 때 (hadoop hdfs위에 작동하는 것이기 때문에 hdfs의 고가용성을 그대로 사용가능)	- simple record level CRUD - 상대적으로 작은 사이즈 데이터 - when you need transaction support (데이터 정합성, 일관성 등 요구될 때)

HDFS	Hbase
distributed File System stroing large files on disk	Database built on top of HDFS
doesn't support fast individual record look up	HBase provides fast lookups for larger tables.
It provides high latency batch processing; no concept of batch processing.	It provides low latency access to single rows from billions of records Randomaccess.
It provides only sequential access of data.	HBase internally uses Hash tables and provides random access, and it stores the data in indexed HDFS files for faster lookups.

추론 최적화 시리즈 [1] Bert4rec Pytorch module을 Torch-Tensorrt로 compile 하여 Tritonserver로 실시간 추론하기 (0)	2024.07.15
빅데이터 플랫폼 Pilot 프로젝트 04 feat. Cloudera Data Platform (0)	2020.08.31
빅데이터 플랫폼 Pilot 프로젝트 03 feat. Cloudera Data Platform (0)	2020.08.31
빅데이터 플랫폼 Pilot 프로젝트 02 feat. Cloudera Data Platform (0)	2020.08.31
빅데이터 플랫폼 Pilot 프로젝트 01 feat. Cloudera Data Platform (0)	2020.08.31

'Data Engineering' Related Articles

Comments

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`