- Knowledge and Observability
- Real-time observability
- Introducing pg_wait_sampling
- Configuring pg_wait_sampling on macOS
- Basic Usage of pg_wait_sampling
- Customization
- Cloud Support
- Resources
- Wrap Up
PostgreSQL uses a complex system of locks to balance concurrent operations and data consistency, across many transactions. Those intricacies are beyond the scope of this post. Here we want to specifically look at queries that are waiting, whether on locks or for other resources, and learn how to get more insights about why.
Balancing concurrency with consistency is an inherent part of the MVCC system that PostgreSQL uses. One of the operational problems that can occur with this system, is that queries get blocked waiting to acquire a lock, and that wait time can be excessive, causing errors.
In order to understand what’s happening with near real-time visibility, PostgreSQL provides system views like pg_locks and pg_stat_activity that can be queried to see what is currently executing. Is that level of visibility enough? If not, what other opportunities are there?
Knowledge and Observability
When a query is blocked and waiting to acquire a lock, we usually want to get more information when debugging.
The query holding the lock is the “blocking” query. A waiting query and a blocking query don’t always form a one-to-one relationship though. There may be multiple levels of blocking and waiting.
Real-time observability
In Postgres, we have “real-time” visibility using pg_stat_activity.
We can find queries in a “waiting” state:
SELECT
pid,
wait_event_type,
wait_event,
LEFT (query,
60) AS query,
backend_start,
query_start,
(CURRENT_TIMESTAMP - query_start) AS ago
FROM
pg_stat_activity
WHERE
datname = 'rideshare_development';
We can combine that information with lock information from the pg_locks catalog.
Combining lock information from pg_locks and active query information from pg_stat_activity becomes powerful. The query below joins these sources together.
https://github.com/andyatkinson/pg_scripts/blob/main/lock_blocking_waiting_pg_locks.sql
The result row fields include:
blocked_pidblocked_userblocking_pidblocking_userblocked_queryblocking_queryblocked_query_startblocking_query_start
That’s great information, however there can still be a problem.
When there’s an incident and after it’s resolved, queries get cleared out and we no longer have historical information, since what we looked at in pg_stat_activity and pg_locks was live information.
How can we explore historical context? Or, how can we broaden our searches to include many samples and not just a single sample?
Introducing pg_wait_sampling
To solve the need for historical analysis, and for the collection of many samples, the extension pg_wait_sampling was created by Alexander Korotkov to solve these problems.
Configuring pg_wait_sampling on macOS
- Compile extension following instructions on GitHub postgrespro/pg_wait_sampling
- Edit
postgresql.confto add the extension toshared_preload_libraries - Restart Postgres (due to shared preload libraries)
- Enable extension (via
CREATE EXTENSIONcommand) from psql, as a superuser (postgres) - After connecting via psql, change
search_pathto the schema for the extension (rideshare)
Basic Usage of pg_wait_sampling
With the extension enabled, we get access to two views:
From the view pg_wait_sampling_profile we get the following fields. The queryid field is the same queryid that’s a unique identifier per instance in Postgres that we have available from pg_stat_statements.
pidevent_typeeventqueryidcount
Here are fields we get in the pg_wait_sampling_history:
pidts(timestamp)event_typeeventqueryid
Customization
https://postgrespro.com/docs/enterprise/9.6/pg-wait-sampling
pg_wait_sampling.profile_period= '10ms'pg_wait_sampling.history_size = 1000
Cloud Support
- GCP Cloud SQL supports it, and without a server restart
- Tembo supports pg_wait_sampling via trunk
- AWS RDS does not list pg_wait_sampling in supported extensions
- Microsoft Azure Database for PostgreSQL - Flexible Server, does not list pg_wait_sampling in extensions
AWS seems to have its own wait event analysis.
Resources
- Learn more about Alexander on Hacking Postgres: https://www.youtube.com/watch?v=FrOvwkmAPvg
- Extension: https://github.com/postgrespro/pg_wait_sampling
- Announcement blog post: https://akorotkov.github.io/blog/2016/03/25/wait_monitoring_9_6/
- Exploring Query Locks in Postgres
pg_blocking_pids()https://pgpedia.info/p/pg_blocking_pids.html- Postgres.fm Wait events episode
Wrap Up
This post was meant to describe the problem pg_wait_sampling solves, how to install it for macOS and begin exploring the information. In a future post, we may use pg_wait_sampling as part of a concurrency/blocking query analysis and investigation. Stay tuned.
Thanks for reading!
