Here are tuning params, tips and misc. information collected from work experience with PostgreSQL that didn’t quite fit into a single blog post. More of an evolving source of personal documentation, references, and examples.
Query: Approximate count on any table
count(*) query on a large table may be too slow. If an approximate count is acceptable use this:
SELECT relname, relpages, reltuples::numeric, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='table';
PostgreSQL runs an autovacuum process in the background to remove dead tuples. Dead tuples are the result of a multiversion model (MVCC).
There are two parameters to tune, “scale factor” and “threshold”.
In routine vacumming, the two options are listed:
- scale factor (a percentage of dead tuples)
- threshold (a specific number of dead tuples)
The scale factor defaults to 20% (
0.20). To optimize for our largest tables it has been recommended to set the scale factor very low 1% (
0.01) or opt out of it altogether.
To opt out of scale factor entirely, set the value to 0 and set the threshold to the number to trigger the condition, e.g. 1000.
ALTER TABLE bigtable SET (autovacuum_vacuum_scale_factor = 0); ALTER TABLE bigtable SET (autovacuum_vacuum_threshold = 1000);
If after experimentation you’d like to reset, use the
ALTER TABLE bigtable RESET (autovacuum_vacuum_threshold); ALTER TABLE bigtable RESET (autovacuum_vacuum_scale_factor);
Additional AV parameters
Remove unused indexes
Indexes may have been created that are not used as part of a query plan. These should be removed to reduce unnecessary IO associated with maintaining the index.
Ensure these are set to
SHOW track_activities; SHOW track_counts;
Now we can take advantage of tracking on whether indexes have been used or not. We can look for zero scans, and also very infrequent scans.
Cybertec blog post with SQL query to discover unused indexes: Get Rid of Your Unused Indexes!
On our very large production database where this process had never been done, we had dozens of indexes that could be eliminated, taking of 100s of gigabytes of space.
SELECT s.schemaname, s.relname AS tablename, s.indexrelname AS indexname, pg_relation_size(s.indexrelid) AS index_size FROM pg_catalog.pg_stat_user_indexes s JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid WHERE s.idx_scan = 0 -- has never been scanned AND 0 <>ALL (i.indkey) -- no index column is an expression AND NOT i.indisunique -- is not a UNIQUE index AND NOT EXISTS -- does not enforce a constraint (SELECT 1 FROM pg_catalog.pg_constraint c WHERE c.conindid = s.indexrelid) ORDER BY pg_relation_size(s.indexrelid) DESC;
- Statement timeout: TBD
- Reaping frequency: TBD
Foreign Data Wrappers
Native Foreign data wrapper functionality in PostgreSQL allows connecting to a remote table and treating it like a local table.
The table structure may be specified when establishing the foreign table or it may be imported as well.
A bit benefit of this for us at work is that for a recent backfill, we were able to avoid the need for any intermediary data dump files. This is a win in terms of reducing process steps, increasing the overall speed, and decreasing the security risk by eliminating intermediary customer data files.
We used a
temp schema to isolate any temporary tables away from the main schema (
Essentially the process is:
- Create a server
- Create a user mapping
- Create a foreign table (optionally importing the schema)
Let’s say we had 2 services, one for managing inventory items for sale, and one for managing authentication.
We wanted to connect to the authentication database from the inventory database.
In the case below, the inventory database is connected to with the
root user so there is privileges to create temporary tables, foreign tables etc.
create EXTENSION postgres_fdw; CREATE SCHEMA temp; CREATE SERVER temp_authentication; FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'authentication-db-host', dbname 'authentication-db-name', port '5432'); -- set the host, name and port CREATE USER MAPPING FOR root SERVER temp_authentication OPTIONS (user 'authentication-db-user', password 'authentication-db-password'); -- map the local root user to a user on the remote DB IMPORT FOREIGN SCHEMA public LIMIT TO (customers) FROM SERVER temp_authentication INTO temp; -- this will make a table called temp.customers
Once this is established, we can issue queries as if the foreign table was a local table:
select * from temp.customers limit 1;
HOT (“heap only tuple”) updates, are updates to tuples not referenced from outside the table block.
- there must be enough space in the block containing the updated row
- there is no index defined on any column whose value is modified (big one)
- A percentage between 10 and 100
- you can adjust it to leave room for HOT updates when they’re possible
- For tables with heavy updates a smaller fillfactor is appropriate
- Per table or per index
“Then slow lock acquisition will appear in the database logs for later analysis.”
Query planning tools
Nice tool and I learned a couple of tips. Format
EXPLAIN output with JSON, and specify some additional options. Handy SQL comment to have hanging around on top of the query to study:
explain (analyze, buffers, verbose, format text) or specify
Query: 10 largest tables
select schemaname as table_schema, relname as table_name, pg_size_pretty(pg_total_relation_size(relid)) as total_size, pg_size_pretty(pg_relation_size(relid)) as data_size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as external_size from pg_catalog.pg_statio_user_tables order by pg_total_relation_size(relid) desc, pg_relation_size(relid) desc limit 10;