Docker Compose Example for Importing CSV into Elasticsearch via Python Client
About
This is an example-based guide on how to set up a queriable Elasticsearch microservice with docker-compose. The end result is a docker container pre-populated with index and could respond to queries via http://localhost:9200.
Runtime information
NOTE: Version numbers are only for illustrative purposes and are specific to this example. The steps shown in this guide are applicable to any version combinations that can work together.
Tooling prerequisites
- [Required] Docker has been installed in the environment
- [Optional but recommended] Poetry for python dependency management and packaging
Version information
- Docker engine: 20.10.5
- Python: 3.7
- Elasticsearch: 7.10
- Elasticsearch’s python client: 7.13.4
Python dependencies
The following is the content of requirements.txt
for creating Elasticsearch index from CSV:
Main dependencies as shown in poetry’s pyproject.toml
:
...[tool.poetry.dependencies]
python = "^3.7"
elasticsearch = "7.13.4"
pandas = "1.1.5"
retry = "^0.9.2"...
Docker set up
Folder structure for the microservice
docker
├── docker-compose.yml
└── indexer
├── Dockerfile
├── data
│ └── index_input.csv
├── indexer.py
└── requirements.txt
- Content of
docker-compose.yml
:
version: '3'services:
elasticsearch:
container_name: example_es
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.0
ports:
- 9200:9200
- 9300:9300
environment:
- discovery.type=single-node
networks:
- elasticsearch_networkindexer:
container_name: example_indexer
build: indexer/
depends_on:
- elasticsearch
networks:
- elasticsearch_networknetworks:
elasticsearch_network:
driver: bridge
2. Dockerfile of the indexer:
FROM python:3.7-slim
WORKDIR /app
RUN mkdir -p /app/data
ADD . /app
ADD ./data /app/data
RUN pip install -r requirements.txtENTRYPOINT ["python"]
CMD ["indexer.py"]
- To generate the
requirements.txt
file from poetry, run the following command at the project’s root folder (whereproject.toml
is located) within the currently active poetry environment:
$ poetry export --format requirements.txt --without-hashes --output docker/indexer/requirements.txt
Example indexer.py
:
from retry import retry
import pandas as pd
import elasticsearch
import elasticsearch.helpers@retry(elasticsearch.ConnectionError, max_delay=300, delay=5)
def indexer():
es_client = elasticsearch.Elasticsearch(hosts=[{"host": "example_es"}])
index_name = "example"
number_of_shards = 1df_index = pd.read_csv(
"/app/data/index_input.csv", na_filter=False
)es_params = {
"index": index_name,
"body": {
"settings": {"index": {"number_of_shards": number_of_shards}}
},
}
if es_client.indices.exists(index=index_name):
es_client.indices.delete(index=index_name)
es_client.indices.create(**es_params)
elasticsearch.helpers.bulk(
es_client,
df_index.to_dict(orient="records"),
doc_type="_doc",
index=index_name,
)if __name__ == "__main__":
indexer()
3. Steps for spinning up the microservice:
# Prerequisite:
# Make sure your CSV file (index_input.csv) has been placed under
# <parent_path>/docker/indexer/data/# The easiest location to run docker-compose is
# at the path where docker-compose.yml is located;
# otherwise, `-f <path to docker-compose.yml>` must be supplied
$ docker-compose up --detach --build
Once docker runs through all the steps, we can check docker status:
$ docker-compose ps
Name Command State Ports -------------------------------------------------------------------------------------------------------------example_es /tini -- /usr/local/bin/do ... Up 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp
example_indexer python indexer.py Exit 0
4. Some checks in the terminal
$ curl -X GET 'http://localhost:9200/_cat/indices?v'
At this point, the microservice should be up and running via endpoint http://localhost:9200
Other useful commands
# 1. To tear down and clean up the microservice
$ docker-compose down# 2. To stop microservice
$ docker-compose stop# 3. To start microservice
$ docker-compose start# 4. To check logs of a specific service
$ docker-compose logs <service name>