Docker Compose Example for Importing CSV into Elasticsearch via Python Client

Shan Dou
2 min readSep 10, 2021

About

This is an example-based guide on how to set up a queriable Elasticsearch microservice with docker-compose. The end result is a docker container pre-populated with index and could respond to queries via http://localhost:9200.

Runtime information

NOTE: Version numbers are only for illustrative purposes and are specific to this example. The steps shown in this guide are applicable to any version combinations that can work together.

Tooling prerequisites

  • [Required] Docker has been installed in the environment
  • [Optional but recommended] Poetry for python dependency management and packaging

Version information

  • Docker engine: 20.10.5
  • Python: 3.7
  • Elasticsearch: 7.10
  • Elasticsearch’s python client: 7.13.4

Python dependencies

The following is the content of requirements.txt for creating Elasticsearch index from CSV:

Main dependencies as shown in poetry’s pyproject.toml :

...[tool.poetry.dependencies]
python = "^3.7"
elasticsearch = "7.13.4"
pandas = "1.1.5"
retry = "^0.9.2"
...

Docker set up

Folder structure for the microservice

docker
├── docker-compose.yml
└── indexer
├── Dockerfile
├── data
│ └── index_input.csv
├── indexer.py
└── requirements.txt
  1. Content of docker-compose.yml:
version: '3'services:
elasticsearch:
container_name: example_es
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.0
ports:
- 9200:9200
- 9300:9300
environment:
- discovery.type=single-node
networks:
- elasticsearch_network
indexer:
container_name: example_indexer
build: indexer/
depends_on:
- elasticsearch
networks:
- elasticsearch_network
networks:
elasticsearch_network:
driver: bridge

2. Dockerfile of the indexer:

FROM python:3.7-slim
WORKDIR /app
RUN mkdir -p /app/data
ADD . /app
ADD ./data /app/data
RUN pip install -r requirements.txt
ENTRYPOINT ["python"]
CMD ["indexer.py"]
  • To generate the requirements.txt file from poetry, run the following command at the project’s root folder (where project.tomlis located) within the currently active poetry environment:
$ poetry export --format requirements.txt --without-hashes --output docker/indexer/requirements.txt

Example indexer.py:

from retry import retry
import pandas as pd
import elasticsearch
import elasticsearch.helpers
@retry(elasticsearch.ConnectionError, max_delay=300, delay=5)
def indexer():
es_client = elasticsearch.Elasticsearch(hosts=[{"host": "example_es"}])
index_name = "example"
number_of_shards = 1
df_index = pd.read_csv(
"/app/data/index_input.csv", na_filter=False
)
es_params = {
"index": index_name,
"body": {
"settings": {"index": {"number_of_shards": number_of_shards}}
},
}
if es_client.indices.exists(index=index_name):
es_client.indices.delete(index=index_name)
es_client.indices.create(**es_params)
elasticsearch.helpers.bulk(
es_client,
df_index.to_dict(orient="records"),
doc_type="_doc",
index=index_name,
)
if __name__ == "__main__":
indexer()

3. Steps for spinning up the microservice:

# Prerequisite: 
# Make sure your CSV file (index_input.csv) has been placed under
# <parent_path>/docker/indexer/data/
# The easiest location to run docker-compose is
# at the path where docker-compose.yml is located;
# otherwise, `-f <path to docker-compose.yml>` must be supplied
$ docker-compose up --detach --build

Once docker runs through all the steps, we can check docker status:

$ docker-compose ps
Name Command State Ports -------------------------------------------------------------------------------------------------------------example_es /tini -- /usr/local/bin/do ... Up 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp
example_indexer python indexer.py Exit 0

4. Some checks in the terminal

$ curl -X GET 'http://localhost:9200/_cat/indices?v'

At this point, the microservice should be up and running via endpoint http://localhost:9200

Other useful commands

# 1. To tear down and clean up the microservice
$ docker-compose down
# 2. To stop microservice
$ docker-compose stop
# 3. To start microservice
$ docker-compose start
# 4. To check logs of a specific service
$ docker-compose logs <service name>

--

--