Hello. Thank you for visiting.
This page is an additional resources page related to a presentation given Wed. May 18, 2022 at RailsConf 2022 in Portland, OR.
Below are some additional resources and experiments that I developed while working on the content for the talk.
This page isn’t organized well but roughly falls into the 5 major categories in the presentation.
RailsConf PostgreSQL Prep And Background
- 6 past RailsConf PostgreSQL talks
- PostgreSQL is a general purpose database for a variety of workloads. We care about 🕸️
web applicationsworkloads (online transaction processing, OLTP).
- 5 Use Cases were selected related to Scaling and Performance
Resources for “Migrations On Busy Databases”
Example: Slow DDL changes with a volatile default
create database if not exists pup_tracker_production; -- create pups create table pups (id bigserial, name varchar); -- create locations create table locations ( id bigserial primary key, -- or omit `primary key` to create without PK index latitude NUMERIC(14, 11), longitude NUMERIC(14, 11) ); -- generate 1 million pups -- around 3s INSERT INTO pups (name) select substr(md5(random()::text), 0, 6) from generate_series(1,1000000); -- generate 10 million locations -- around 38s, or 1m 10s with a primary key INSERT INTO locations (latitude, longitude) SELECT (random() * (52.477040512464626 - 52.077090052913654)) + 52.077090052913654 AS lat, (random() * (52.477040512464626 - 52.077090052913654)) + 52.077090052913654 AS lng FROM generate_series(1,10000000); -- add column with default value -- 5ms with default, 2ms without, non-volatile value -- repeated runs the difference is too small to notice alter table locations add column city_id integer default SELECT floor(random()*25); -- DDL to add column but with a "volatile" value, a random integer between 0 and 25 -- takes around 25s! This would be bad if the table is locked with `ACCESS EXCLUSIVE` in this time period alter table locations add column city_id integer default floor(random()*25);
Links for Locking, Blocking, Queueing
- Lock Queue
- PostgreSQL Explicit Locking
- Postgres Locking Revealed
- What Postgres SQL causes a Table Rewrite?
- PostgreSQL Alter Table and Long Transactions
Example demonstrating locks and blocking
- Transaction never conflicts with itself
- We can create a made-up example that opens a transaction and creates a lock in one session, then tries an
alter tablein a second session
Using the rideshare database and trips table
-- first psql session anatki@[local]:5432 rideshare_development# begin; BEGIN Time: 0.129 ms anatki@[local]:5432* rideshare_development# lock trips in access exclusive mode; LOCK TABLE Time: 0.103 ms # select mode, pg_class.relname, locktype, relation from pg_locks join pg_class ON pg_locks.relation = pg_class.oid AND pg_locks.mode = 'AccessExclusiveLock'; -[ RECORD 1 ]----------------- mode | AccessExclusiveLock relname | trips locktype | relation relation | 461492 --- in second psql session, attempt to migrate the database with an alter table alter table trips add column city_id integer default 1; -- now we can look at lock activity and see that it is blocked # select wait_event_type,wait_event,query from pg_stat_activity where wait_event = 'relation' AND query like '%alter table%'; -[ RECORD 1 ]---+-------------------------------------------------------- wait_event_type | Lock wait_event | relation query | alter table trips add column city_id integer default 1; -- lock timeout is 0 which means it is disabled show lock_timeout;
If lock timeout is disabled entirely, setting it will set an upper bound on how long the alter table transaction is in a blocked state.
We can see that the lock timeout is the reason that the transaction is canceled.
anatki@[local]:5432 rideshare_development# begin; BEGIN Time: 0.085 ms anatki@[local]:5432* rideshare_development# SET LOCAL lock_timeout = '5s'; SET Time: 0.078 ms anatki@[local]:5432* rideshare_development# alter table trips add column city_id integer; ERROR: canceling statement due to lock timeout Time: 5001.250 ms (00:05.001)
- Set a lock timeout
- Set it high enough to allow some waiting, but not so long that transactions are blocked for a long time
- If lock timeout is exceeded, likely need to try again at a less busy time
- A long running statement may be canceled by the statement timeout. Consider raising statement timeout just for the migration session.
Definition for “Table rewrites” which are time consuming, and triggered by some DDLs
Definition for table rewrites: Something like “A table rewrite is a behind-the-scenes copy of the table with a new structure, and all row data copied from the old structure to the new structure”
Via lukasfittl I think thats correct - I was trying to confirm whether alter table commands that require a rewrite actually make a full copy (as indicated by the documentation), and it does appear so, see here in the source: https://github.com/postgres/postgres/blob/master/src/backend/commands/tablecmds.c#L5506
Resources for Exhausting Database Connections
- How many have been idle for a long time?
- Assessing connection usage for application servers and background processes
- High Connections in PgHero.
Notes on Forking the main PostgreSQL process
- Postmaster is first process. Additional background processes are started: BG writer, Autovacuum launcher, Check pointer etc. See full list here
- Memory used by connections
- PostgreSQL uses shared memory and process memory
Database Connections Resources
- Estimate database connections pool size for Rails application
- Concurrency and Database Connections in Ruby with Active Record
- What are advantages of using transaction pooling with pgbouncer?
- Be Prepared!
Discussion of prepared statements
Simple example, select a row by primary key id.
prepare loc (int) as select * from locations where id = $1; execute loc(1);
While providing a minimal boost in performance, this functionality also makes an application less vulnerable to SQL injection attacks.
Active Record automatically turns your queries into prepared statements by default
bundle exec rails console
>> Location.find(1) Location Load (1.2ms) SELECT "locations".* FROM "locations" WHERE "locations"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Explore the prepared statement cache:
ActiveRecord::Base.connection.execute('select * from pg_prepared_statements').values
By default Rails will generate up to 1,000 prepared statements per connection
Connection Pooling Example: PgBouncer
Default port is
6432 or 1000 higher than default PostgreSQL port
On Mac OS install with
brew install pgbouncer
PgBouncer config: pgbouncer.ini
brew services restart pgbouncer
brew services info pgbouncer
Or run manually:
/usr/local/opt/pgbouncer/bin/pgbouncer -q /usr/local/etc/pgbouncer.ini
Existing DB connection:
-- Make a test app user CREATE USER app_user WITH PASSWORD 'jw8s0F4';
app_user as an admin for simplicity
[pgbouncer] listen_port = 6432 listen_addr = localhost auth_type = md5 auth_file = /usr/local/etc/userlist.txt logfile = pgbouncer.log pidfile = pgbouncer.pid admin_users = app_user
Now we can connect using psql via pgbouncer:
View some information:
psql -p 6432 -U app_user -W pgbouncer (Use password from above)
Online restart (without disconnecting clients)
pgbouncer -R or send a
show databases, review the
- Pool modes (most aggressive and least compatible, to least aggressive, most compatible)
Specifies when a server connection can be reused by other clients.
- session: Server is released back to pool after client disconnects. Default.
- transaction: Server is released back to pool after transaction finishes.
- statement: Server is released back to pool after query finishes. Transactions spanning multiple statements are disallowed in this mode.
Cannot use transaction pooling mode while also using prepared statements, which are enabled by default in Rails.
Limited to session mode. Alternately, disable prepared statements and then transaction mode may be used.
Discussion points on High Performance SQL Queries
Using PgHero: open source PostgreSQL Performance Dashboard
- Statistics about database size, table size, index size
- Work on high impact queries via statistics with
- High connections
- Foreign key constraints marked
NOT VALID(still enforced for new inserts or updates)
- Parameter values
- Index Bloat Estimated Percentage (hidden index bloat page)
- Scheduled Jobs via
Discussing High Impact Database Maintenance
It might be worthwhile to reindex periodically just to improve access speed.
- REINDEX requires ACCESS EXCLUSIVE
- Use with CONCURRENTLY option, which requires only a SHARE UPDATE EXCLUSIVE
- Use pg_cron
- pg_cron : Probably the best way to schedule jobs within PostgreSQL database.
andy@[local]:5432 rideshare_development# reindex index trips_intermediate_rating_idx; REINDEX Time: 13.556 ms andy@[local]:5432 rideshare_development# reindex index concurrently trips_intermediate_rating_idx; REINDEX Time: 50.108 ms
Using PostgreSQL native Replication and Partitioning
Using Multiple Databases with Replication for Rails Apps
- Concrete example with andyatkinson/rideshare TBD, in development
- Query trips table from replica
Partitioning Example With Range Partitioning
- Example using pgslice to range partition trips table on
- Partition Pruning
- When the planner can prove a partition can be excluded, it excludes it.
- Partitioning and Constraint Exclusion
SET enable_partition_pruning = on; -- the default SHOW constraint_exclusion; SET constraint_exclusion = partition; -- the default, or "on"
- Constraint Exclusion