일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
- Spark structured streaming
- dataengineer
- Terraform
- cloudera
- 하둡
- apache spark
- 추천시스템
- 빅데이터
- spark
- 데이터엔지니어링
- 블로그
- pyspark
- 개발자혜성
- hadoop
- 개발자
- kafka
- DataEngineering
- redis bloom filter
- Data engineering
- 데이터엔지니어
- 클라우데라
- kubernetes
- eks
- recommendation system
- BigData
- 빅데이터플랫폼
- Python
- AWS SageMaker
- 하둡에코시스템
- mlops
- Today
- Total
Hyesung Oh
Dockerfile Reference 문서를 읽고 나름대로 정리한 중요한 포인트들 본문
ADD, COPY
- The <src> path must be inside the context of the build; you cannot ADD ../something /something, because the first step of a docker build is to send the context directory (and subdirectories) to the docker daemon.
- If <src> is a directory, the entire contents of the directory are copied, including filesystem metadata.The directory itself is not copied, just its contents.
ADD vs COPY
COPY는 로컬 파일을 Container 로컬 경로에 그대로 복사하는 역할을 한다. 하지만 ADD는 그 외에도 아래 두 가지 추가 기능이 있다.
- If <src> is a URL and <dest> does not end with a trailing slash, then a file is downloaded from the URL and copied to <dest>.
- If is a local tar archive in a recognized compression format (identity, gzip, bzip2 or xz) then it is unpacked as a directory. Resources from remote URLs are not decompressed. When a directory is copied or unpacked, it has the same behavior as tar -x, the result is the union of:
ARG, ENV
ARG는 persistent 하지 않고 ENV는 build time에 Dockerfile내에서 persistent하다. multi stage 빌드를 할 시 ARG를 매 stage 마다 정의하지 않으면 ARG는 null 값이 된다.
SCOPE
An ARG variable definition comes into effect from the line on which it is defined in the Dockerfile not from the argument’s use on the command-line or elsewhere. For example, consider this Dockerfile:
FROM busybox
USER ${user:-some_user}
ARG user
USER $user
# ...
A user builds this file by calling:
$ docker build --build-arg user=what_user .
The USER at line 2 evaluates to some_user as the user variable is defined on the subsequent line 3. The USER at line 4 evaluates to what_user as user is defined and the what_user value was passed on the command line. Prior to its definition by an ARG instruction, any use of a variable results in an empty string.
An ARG instruction goes out of scope at the end of the build stage where it was defined. To use an arg in multiple stages, each stage must include the ARG instruction.
FROM busybox
ARG SETTINGS
RUN ./run/setup $SETTINGS
FROM busybox
ARG SETTINGS
RUN ./run/other $SETTINGS
Using ARG Variables
RUN instructions following an ARG instruction use the ARG variable implicitly (as an environment variable),
RUN 은 이전 단계에서 정의된 ARG를 암묵적으로 사용한다.
For example, consider these two Dockerfile:
FROM ubuntu
ARG CONT_IMG_VER
RUN echo $CONT_IMG_VER
FROM ubuntu
ARG CONT_IMG_VER
RUN echo hello
CONT_IMG_VER이 변경되었을 때 line2 에서는 둘다 cache miss가 나지 않지만, lin3에서는 나는 이유이다.
Predefined ARGs
Docker has a set of predefined ARG variables that you can use without a corresponding ARG instruction in the Dockerfile.
- HTTP_PROXY
- http_proxy
- HTTPS_PROXY
- https_proxy
- FTP_PROXY
- ftp_proxy
- NO_PROXY
- no_proxy
- ALL_PROXY
- all_proxy
사전에 정의된 ARGS는 Dockerfile 내에서 ARGS 정의없이 사용가능하다. 하지만, ARGS 명시를 할 시에는 값 변경시 cache miss가 날 수 있다.
To use these, pass them on the command line using the --build-arg flag, for example:
$ docker build --build-arg HTTPS_PROXY=https://my-proxy.example.com .
By default, these pre-defined variables are excluded from the output of docker history. Excluding them reduces the risk of accidentally leaking sensitive authentication information in an HTTP_PROXY variable.
For example, consider building the following Dockerfile using --build-arg HTTP_PROXY=http://user:pass@proxy.lon.example.com
FROM ubuntu
RUN echo "Hello World"
In this case, the value of the HTTP_PROXY variable is not available in the docker history and is not cached. If you were to change location, and your proxy server changed to http://user:pass@proxy.sfo.example.com, a subsequent build does not result in a cache miss.
If you need to override this behaviour then you may do so by adding an ARG statement in the Dockerfile as follows:
FROM ubuntu
ARG HTTP_PROXY
RUN echo "Hello World"
When building this Dockerfile, the HTTP_PROXY is preserved in the docker history, and changing its value invalidates the build cache.
VOLUME
Changing the volume from within the Dockerfile: If any build steps change the data within the volume after it has been declared, those changes will be discarded.
볼륨을 정의한 뒤 해당 볼륨내 데이터를 수정하는 과정은 폐기 처분된다. 따라서 데이터 변경 작업은 VOLUME 선언 전에 정의해야한다.
WORKDIR
절대 경로로 지정하지 않고 상대경로로 지정하면, 이전에 지정한 WORKDIR의 subdirectory로 지정이된다.
WORKDIR instruction sets the working directory for any RUN, CMD, ENTRYPOINT, COPY and ADD instructions that follow it in the Dockerfile. If the WORKDIR doesn’t exist, it will be created even if it’s not used in any subsequent Dockerfile instruction.
The WORKDIR instruction can be used multiple times in a Dockerfile. If a relative path is provided, it will be relative to the path of the previous WORKDIR instruction. For example:
WORKDIR /a
WORKDIR b
WORKDIR c
RUN pwd
The output of the final pwd command in this Dockerfile would be /a/b/c.
WORKDIR은 ENV 값을 사용할 수 있지만, 명시적으로 선언되지 않으면 null 값을 읽는다.
The WORKDIR instruction can resolve environment variables previously set using ENV. You can only use environment variables explicitly set in the Dockerfile. For example:
ENV DIRPATH=/path
WORKDIR $DIRPATH/$DIRNAME
RUN pwd
The output of the final pwd command in this Dockerfile would be /path/$DIRNAME
If not specified, the default working directory is /. In practice, if you aren’t building a Dockerfile from scratch (FROM scratch), the WORKDIR may likely be set by the base image you’re using.
Therefore, to avoid unintended operations in unknown directories, it is best practice to set your WORKDIR explicitly.
Reference: https://docs.docker.com/engine/reference/builder/#workdir
'Data Engineering > DevOps' 카테고리의 다른 글
AWS EKS의 RBAC, IRSA 딥다이브 (0) | 2022.05.31 |
---|---|
Kubernetes에서 Open Source Redash Helm Chart로 운영하기 (0) | 2022.05.12 |
Terraform으로 AWS EKS의 aws-auth configmap 관리하기 (0) | 2022.04.28 |
Terraform 입문하기 (0) | 2021.01.28 |