Most of the projects we have the pleasure of creating collect a lot of data. It usually means a heavy load on the database, which we use for data persistence. Often our data is time-series data. But what can we do to provide better request handling and lower access times?
I will present how you can integrate Elixir and Ecto with one, very popular extension for PostgreSQL database which is TimescaleDB.
TimescaleDB and time-series data
TimescaleDB is a very popular extension for PostgreSQL. It allows better handling of time-series data.
But what does time-series data itself mean concerning other data?
Time-series data collectively represents changes in a process, system, or behavior over time. Data should always has a time aspect by including timestamp. The vast majority of data is append-only. We are also most often interested in the latest data in the processing process. The frequency aspect of adding new data is not significant.
Time-series data can be found everywhere. Monitoring computer systems, container metrics (CPU, free memory, net/disk IOPs), and application metrics (requests) are time-series data. Data from Internet of Things sensors, data from the application (user login, clicks), or even financial trading systems generate data from this category.
Business use case
Technology and proposed solutions should match the business expectations and challenges of real projects. In one of the projects that I was developing, data were collected regarding, among others, the availability of games (yep, also Cyberpunk 2077 sales summary).
The original use of PostgreSQL alone has proved insufficient. Saving data typically time-series and operations on them away from the database (we need to obtain a large volume of data and processing in Elixir) did not work well.
TimescaleDB came to the rescue. It was possible to process the data inside the database better. The use of time-divided tables allowed for better access to the latest data. Let’s see below with code examples of how you can easily manage it from within Ecto.
Ecto and TimescaleDB
The considerable advantage of TimescaleDB is the full SQL interface for all SQL natively supported by PostgreSQL. So we can use Postgrex and Ecto to communicate with the database.
Let’s begin by ensuring the extension is available in the database.
1 mix ecto.gen.migration create_timescaledb_extension
1 2 3 4 5 6 7 8 9 10 11 12 13 # priv/repo/migrations/20210131105904_create_timescaledb_extension.exs defmodule Project.Repo.Migrations.CreateTimescaledbExtension do use Ecto.Migration def up do execute("CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE") end def down do execute("DROP EXTENSION IF EXISTS timescaledb CASCADE") end end
When you want to create a new table using the options introduced in the extension, use the code below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 # priv/repo/migrations/20210131110504_create_conditions_table.exs defmodule Project.Repo.Migrations.CreateConditionsTable do use Ecto.Migration def up do create table(:conditions, primary_key: false) do add(:time, :naive_datetime, null: false) add(:location, :string, null: false) add(:temperature, :decimal, null: false) add(:humidity, :decimal, null: false) timestamps() end execute("SELECT create_hypertable('conditions', 'time')") end def down do drop(table(:conditions)) end end
conditions will store the conditions (temperature and humidity) at a given location.
Instead of the default
SERIAL primary key, we will use the
naive_datetime to split records based on that key.
create_hypertable command requires
execute, hence the separation into
down to undo the migration.
As we can see, using TimescaleDB is not complicated at all. Depending on the project, we can gain a lot from its introduction to our system. I recommend you check out the TimescaleDB vs. Relational Databases comparison.