Over time our indexes became bloated, meaning invisible rows were still present in the index. This is inefficient and can be addressed by rebuilding the index.

PostgreSQL 12 introduced the ability to reindex concurrently, meaning an index could be rebuilt without interrupting concurrent transactions.

As we were running version 10, we had no such native facility. So the solution we used was pg_repack to achieve a similar effect, a rebuilt index without interrupting live traffic. This is achieved by building an efficient duplicate index in the background, swapping the new one name in for the old one, and deleting the old one.

The Steps

In our case, we ran pg_repack as a command line program from an EC2 instance, connected to an RDS database. The steps were as follows.

  1. Install pg_repack with yum. May need some more PG and gcc tools as well. I also installed several different versions before landing on compatible versions between pg_repack and our database. This blog post Install pg_repack RDS instructions sums up the install process nicely.
  2. We used an existing EC2 instance that could reach the RDS database. I confirmed this with psql options to establish a connection.
  3. I ran pg_repack from a screen session and detached once it started since it took a while to run. screen -ls will list sessions, or something like screen -R andy-pg_repack to join an existing session. To detach: Ctrl-a d
  4. I ran with time to see the total run time. Besides the options needed to establish a connection, I used the -k option for RDS to “skip superuser checks in client”.

Putting it all together for an index named: idx_17210_index_posts_on_comment_id:

The command would look like this:

time pg_repack -k \
  -h my-awesome-db-postgresql-10-01.host.us-east-1.rds.amazonaws.com \
  -d dbname
  -U superuser \
  --index idx_17210_index_posts_on_comment_id

Caveats

  • Since we use native replication, index rebuilds will cause a lot of read and write disk IO, which may cause replication lag.

We found large indexes over 50 GB caused pretty bad replication lag and could actually cause downtime. On smaller indexes, we could perform these during on peak times, but for large indexes perform this work off peak or ideally even detached from the application if possible.

  • Only one pg_repack can be running at a time.

If you find you are regularly running pg_repack, then you are likely treating the symptom and not the problem. Percona blog covers Understanding pg_repack: What Can Go Wrong – and How to Avoid It and addresses some more caveats with pg_repack.

Summary

Despite some issues, pg_repack allowed us to perform an online removal of severe index bloat without any major downtime. Although the process is repeatable, we expect this to mostly have been a one-time operation as we’ve also tuned our vacuum parameters to be more aggressive.