Don’t use Elixir modules in migrations

Most of our projects use databases. Using Ecto allows us to manage data migrations easily. We often do not realize that incorrectly used functionality may block our applications.

In this post, I would like to present a problem that may occur. It is important to understand the problem and the consequences better. Simple changes can save us from unpleasant consequences.

Understand migrations

Migrations are performed one file after the other. Ecto uses the schema_migrations table, which stores all migrations that have already been executed. We can configure this table’s name with the :migration_source in Ecto configuration option1.

To ensure that the migration is performed once, Ecto will lock2 the schema_migrations table when running migrations. It guarantees that only one server can make changes at the same time.

The table stores version and inserted_at columns. There is no storing of checksums. It is crucial because it allows us to modify and improve already created files.

Problematic schemas

The use of the Elixir code in migration files can be divided into two categories. When we use modules such as Enum or Map, there should be no problems.

The greater risk occurs when we decide to use data structures: structs or Ecto schemas. We can make our application unable to start!

Let’s analyze a simple example. In our project, we have prepared the following migration:

1
2
3
4
5
6
7
8
9
10
11
  # priv/repo/migrations/timestamp_create_logs_table.exs
  defmodule Project.Repo.Migrations.CreateLogsTable do
    use Ecto.Migration

    def change do
      create table(:logs) do
        add(:action, :string)
        add(:details, :text)
      end
    end
  end

In our code, we can use structure:

1
2
3
4
5
6
7
8
9
10
  defmodule Project.Schema.Log do
    use Ecto.Schema

    schema "logs" do
      field :action, :string
      field :details, :string
    end

    ...
  end

Let’s suppose we need to change data in a database. We want to mark the used earlier update action as modified now. We can prepare a simple migration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
  # priv/repo/migrations/timestamp_modify_log_action.exs
  defmodule Project.Repo.Migrations.CreateLogsTable do
    use Ecto.Migration

    def up do
      import Ecto.Query, only: [from: 2]

      from(
        q in Project.Schema.Log,
        where: q.action == "update"
      )
      |> Project.Repo.update_all(set: [action: "modified"])
    end

    def down do
      ...
    end
  end

The project is developing, but at some point, we recognize that our logs need to be transferred to a separate database. We delete the table logs and structure Project.Schema.Log to keep things in order after moving the data. Anyone who wants to run our migration code from zero state will get an error now.

1
2
  ** (UndefinedFunctionError) function Project.Schema.Log.__schema__/1 is undefined
     (module Project.Schema.Log is not available)

Migrations are blocked; CI cannot work. It may be a not very extensive example, but it conveys an idea. Using structures in migrations is dangerous.

Changing the name or deleting the fields will cause problems. Most likely just when we least expect it and have other important tasks.

How to improve invalid migration?

Note: This section has been updated since the article was posted.

There are two ways to improve. The first one (originally included here) is the use of the SQL code prepared by us.

With Ecto.Migration.execute/2 you can execute the SQL command. It can be helpful if you don’t want to create temporary structures.

1
  execute("UPDATE logs SET action = 'modified' WHERE action = 'update';")

The second approach is to use structures that will only be visible in the context of migration. This is some code duplication but also eliminates the problem of changing structures.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
  # priv/repo/migrations/timestamp_modify_log_action.exs
  defmodule Project.Repo.Migrations.CreateLogsTable do
    use Ecto.Migration

    defmodule Log do
      use Ecto.Schema

      schema "logs" do
        field(:action, :string)
        field(:details, :string)
      end
    end

    def up do
      import Ecto.Query, only: [from: 2]

      from(
        q in Log,
        where: q.action == "update"
      )
      |> Project.Repo.update_all(set: [action: "modified"])
    end

    def down do
      ...
    end
  end

Theoretically, both versions will lead to the same state. I leave the choice to you. For me personally, the use of SQL seems like a more straightforward solution. I am updating the post to avoid inaccuracies as there is also an approach beyond what was initially proposed.

Summary

Changes in migrations are not tricky. Instead of structures, we can use schemaless Ecto queries. You can still use Ecto API, but in my opinion, you should use it without schemas. This will eliminate problems when running our application with a clean database or changing fields in schemas. Simple changes today can save us from critical errors in the future.

In the previous version, I used the phrase “raw SQL”. It was not very precise as the Ecto API can still be used. Thank you, Felipe Pereira Stival, for pointing out this ambiguity. Thank you, Josef Strzibny, for pointing out the ability to use temporary module inside migration.

Get new posts and extra comments

You'll receive every new post with extra unpublished comments available only to the subscribers!

I won't send you spam. Unsubscribe at any time.