Andrew Atkinson - Software Engineer, Author, High Performance PostgreSQL for Rails

Wait a minute! — PostgreSQL extension pg_wait_sampling

2024-07-23T00:00:00+00:00

PostgreSQL uses a complex system of locks to balance concurrent operations and data consistency, across many transactions. Those intricacies are beyond the scope of this post. Here we want to specifically look at queries that are waiting, whether on locks or for other resources, and learn how to get more insights about why.

Balancing concurrency with consistency is an inherent part of the MVCC system that PostgreSQL uses. One of the operational problems that can occur with this system, is that queries get blocked waiting to acquire a lock, and that wait time can be excessive, causing errors.

In order to understand what’s happening with near real-time visibility, PostgreSQL provides system views like pg_locks and pg_stat_activity that can be queried to see what is currently executing. Is that level of visibility enough? If not, what other opportunities are there?

Knowledge and Observability

When a query is blocked and waiting to acquire a lock, we usually want to get more information when debugging.

The query holding the lock is the “blocking” query. A waiting query and a blocking query don’t always form a one-to-one relationship though. There may be multiple levels of blocking and waiting.

Real-time observability

In Postgres, we have “real-time” visibility using pg_stat_activity.

We can find queries in a “waiting” state:

SELECT
    pid,
    wait_event_type,
    wait_event,
    LEFT (query,
        60) AS query,
    backend_start,
    query_start,
    (CURRENT_TIMESTAMP - query_start) AS ago
FROM
    pg_stat_activity
WHERE
    datname = 'rideshare_development';

We can combine that information with lock information from the pg_locks catalog.

Combining lock information from pg_locks and active query information from pg_stat_activity becomes powerful. The query below joins these sources together.

https://github.com/andyatkinson/pg_scripts/blob/main/lock_blocking_waiting_pg_locks.sql

The result row fields include:

blocked_pid
blocked_user
blocking_pid
blocking_user
blocked_query
blocking_query
blocked_query_start
blocking_query_start

That’s great information, however there can still be a problem.

When there’s an incident and after it’s resolved, queries get cleared out and we no longer have historical information, since what we looked at in pg_stat_activity and pg_locks was live information.

How can we explore historical context? Or, how can we broaden our searches to include many samples and not just a single sample?

Introducing pg_wait_sampling

To solve the need for historical analysis, and for the collection of many samples, the extension pg_wait_sampling was created by Alexander Korotkov to solve these problems.

Configuring pg_wait_sampling on macOS

Compile extension following instructions on GitHub postgrespro/pg_wait_sampling
Edit postgresql.conf to add the extension to shared_preload_libraries
Restart Postgres (due to shared preload libraries)
Enable extension (via CREATE EXTENSION command) from psql, as a superuser (postgres)
After connecting via psql, change search_path to the schema for the extension (rideshare)

Basic Usage of pg_wait_sampling

With the extension enabled, we get access to two views:

From the view pg_wait_sampling_profile we get the following fields. The queryid field is the same queryid that’s a unique identifier per instance in Postgres that we have available from pg_stat_statements.

pid
event_type
event
queryid
count

Here are fields we get in the pg_wait_sampling_history:

pid
ts (timestamp)
event_type
event
queryid

Customization

https://postgrespro.com/docs/enterprise/9.6/pg-wait-sampling

pg_wait_sampling.profile_period= '10ms'
pg_wait_sampling.history_size = 1000

Cloud Support

GCP Cloud SQL supports it, and without a server restart
Tembo supports pg_wait_sampling via trunk
AWS RDS does not list pg_wait_sampling in supported extensions
Microsoft Azure Database for PostgreSQL - Flexible Server, does not list pg_wait_sampling in extensions

AWS seems to have its own wait event analysis.

Resources

Learn more about Alexander on Hacking Postgres: https://www.youtube.com/watch?v=FrOvwkmAPvg
Extension: https://github.com/postgrespro/pg_wait_sampling
Announcement blog post: https://akorotkov.github.io/blog/2016/03/25/wait_monitoring_9_6/
Exploring Query Locks in Postgres
pg_blocking_pids() https://pgpedia.info/p/pg_blocking_pids.html
Postgres.fm Wait events episode

Wrap Up

This post was meant to describe the problem pg_wait_sampling solves, how to install it for macOS and begin exploring the information. In a future post, we may use pg_wait_sampling as part of a concurrency/blocking query analysis and investigation. Stay tuned.

Thanks for reading!

You make a good point! — PostgreSQL Savepoints

2024-07-22T00:00:00+00:00

This post will look at the basics of PostgreSQL Savepoints within a Transaction.

A transaction is used to form a non-separable unit of work to commit or not, as a unit. Transactions are opened using the BEGIN keyword, then either committed or may be rolled back. Use ROLLBACK without any arguments to do that.

Dividing Up a Transaction

Within the concept of a transaction, there is a smaller unit that allows for incremental persistence, scoped to the transaction, called “savepoints.” Savepoints create sub-transactions with some similar properties to a transaction.

Savepoints

Savepoints mark a particular state of the transaction as a recoverable position. In a similar way to how ROLLBACK rolls back an entire transaction, ROLLBACK TO captures a position within the transaction that the state of the data can be restored to.

After restoring to a savepoint, querying the data will show its state at the time the savepoint was created.

Commands

Savepoints have verbs to know about:

“Savepoint” may be used as a noun or verb depending on the context. Running the command SAVEPOINT a where a is the name of the savepoint, uses “savepoint” as a command verb that creates savepoint “a”. The savepoint “a” (a noun) was created.
The savepoint name can be reused, creating a new savepoint with the same name, reflecting a new state of the data.
Savepoints can be rolled back to, using the ROLLBACK TO command, specifying a named savepoint https://www.postgresql.org/docs/current/sql-rollback-to.html
Savepoints can be “released” by using the RELEASE command. Releasing a savepoint does not change the state of the data though, which is what ROLLBACK TO may do. Releasing a savepoint frees up the savepoint name and releases the resources used to create the samepoint. Read more: https://www.postgresql.org/docs/current/sql-release-savepoint.html

Let’s look at SQL commands for creating and rolling back to a savepoint:

BEGIN;

INSERT INTO vehicles (name) VALUES ('Toyota bZ4X');
SAVEPOINT a;

INSERT INTO vehicles (name) VALUES ('Honda Prologue');
SELECT COUNT(*) FROM vehicles; -- 2

ROLLBACK TO a; -- SELECT COUNT(*) FROM VEHICLES; -- is 1

COMMIT; -- Only one vehicle was saved

A savepoint can be removed by using the RELEASE command.

Here’s an example of creating and releasing a savepoint:

BEGIN;

INSERT INTO vehicles (name) VALUES ('Toyota bZ4X');

SAVEPOINT a;

RELEASE a;

COMMIT; -- Only one vehicle was saved

In the example above, a savepoint was created and then released, not impacting the state of the data.

Reusing Savepoints

Savepoints names can be reused. The docs describe how the SQL standard says savepoints with the same name must be deleted when they’re replaced.

This is a place where Postgres doesn’t fully conform to the SQL standard, since it says in PostgreSQL savepoints are kept around.

When savepoints are created, the state of data within the transaction is saved “just before” at the moment of creation. This can be recovered by restored to the savepoint using the ROLLBACK TO command.

When rolling back to a savepoint, savepoints created later in the transaction are no longer “known.”

Errors

When multiple savepoints are created using the same name, for example three times, they can also each be released as many times as there are savepoints.

Imagine three were created with the name “a”. In that case, release can be called three times for “a”, but on the fourth time it will produce an error.

When this kind of error happens, the outer transaction is now also in an error state. From there the outer transaction may be rolled back. In that error state, calling COMMIT also rolls back the transaction.

BEGIN;

SAVEPOINT a;
SAVEPOINT a;
SAVEPOINT a;

RELEASE a;
RELEASE a;
RELEASE a;
RELEASE a; -- ERROR:  savepoint "a" does not exist

Wrap Up

That was a brief intro to savepoints inside transactions. Remember that savepoints are a mechanism to create “recoverable positions” for a state of transaction-level data, within a transaction.

SaaS on Rails on PostgreSQL — POSETTE 2024

2024-07-13T00:00:00+00:00

In this talk attendees will learn how Ruby on Rails and PostgreSQL can be used to create scalable SaaS applications, focusing on schema and query design, and leveraging database capabilities.

We’ll define SaaS concepts, B2B, B2C, and multi-tenancy. Although Rails doesn’t natively support SaaS or multi-tenancy, solutions like Bullet Train and Jumpstart Rails can be used for common SaaS needs.

Next we’ll cover database designs from the Apartment and acts_as_tenant gems which support multi-tenancy concepts, then connect their design concepts to Citus’s row and schema sharding capabilities from version 12.0.

We’ll also cover PostgreSQL’s LIST partitioning and how to use it for efficient detachment of unneeded customer data.

We’ll cover the basics of leveraging Rails 6.1’s Horizontal Sharding for database-per-tenant designs.

Besides the benefits for each tool, limitations will be described so that attendees can make informed choices.

Attendees will leave with a broad survey of building multi-tenant SaaS applications, having reviewed application level designs and database designs, to help them put these into action in their own applications.

💻 Slide Deck

🎥 YouTube Recording

Mastering PostgreSQL for Rails: An Interview with Andy Atkinson

2024-07-01T00:00:00+00:00

Hey there! My book High Performance PostgreSQL for Rails just went into print last week!

As part of celebrating this milestone, we’ve got a series of promotional appearances lined up to explore Ruby on Rails and PostgreSQL in general and help explain why prospective readers may benefit from the book.

To kick things off, I recently met with Phil Smy who runs a popular YouTube channel with videos on Ruby on Rails and more.

Phil and I met briefly in person at RailsConf 2024 this past May and discussed this interview as Phil was interested in the book and we thought it might benefit Phil’s audience.

I think the interview turned out nicely. It’s about 30 minutes long and we covered how Ruby on Rails developers use their databases via Active Record and by writing SQL, whether their indexes are helping their queries, and most importantly, how to pronounce PostgreSQL. Postgres? Postgres-cue-ell? :)

Check out “Mastering PostgreSQL for Rails: An Interview with Andy Atkinson” below and please let us know what you thought by leaving comments here or on YouTube.

We’d love for you to share this post or a link to the video in your networks or for anyone that might be interested in the topics.

Video Interview

P.S. To celebrate the print availability of the book, we’ve got a discount code for 40% off the ebook version. Use code AAPSQLCOMPLETE at checkout to get the discount.

To read ratings and reviews for the book, please visit my book page.

Check out Phil’s page: https://philsmy.com/mastering-postgresql-for-rails-an-interview-with-andy-atkinson/

Thank you!

🎙️ IndieRails Podcast — Andrew Atkinson - The Postgres Specialist

2024-06-10T00:00:00+00:00

I loved joining Jeremy Smith and Jess Brown as a guest on the IndieRails podcast!

I hope you enjoy this episode write-up, and I’d love to hear your thoughts or feedback.

Early Career

I got my start in programming in college, and my first job writing Java. This was the mid 2000s so this was Java Enterprise Edition, and I worked in a drab, boring gray cubicle, like Peter has in the movie Office Space.

I was so excited though to have a full-time job writing code as an Associate Software Engineer. For me this validated years of work learning how to write code in college, represented a huge pay increase, and felt like something I could make a career out of.

Build a Blog in 15 Minutes

Somewhere along the way I saw the Ruby on Rails 15-minute build a blog video. This was a turning point for me, seeing what, and how an individual developer could build a full-stack web app.

Train Brain

Later in the 2000s, I took a side mission from Ruby on Rails, and taught myself Objective-C and iOS, and launched an app for train riders in Minneapolis. I called the app Train Brain and partnered with Nate Kadlac who designed the app icon, and all of the visuals for the app and website. Nate turned out to be a great connection, as we’ve remained friends over the years. When I asked Nate to do the cover illustration for my book, I was thrilled to hear it would work out because I know Nate is a talented designer, but also because I think we both love to support each other how we can.

Spicy Chicken at Wendy’s

I told a story of learning Ruby on Rails using the Michael Hartl book: RailsSpace. This was a favorite book of mine, because it had readers build a social networking app, which was a hot space to be in at the time. Readers built the app up as they went along, which is a great way to learn any new technology! I was working in a job I wasn’t loving out in the suburbs at this point, and didn’t really have work friends. On my lunch breaks, I’d take lunch myself at Wendy’s, and my usual order was a spicy chicken sandwich. I’d read RailsSpace in the restaurant or even in my car, and would look forward to being back at my computer to type and run the code samples.

LivingSocial

My partner and I moved out to Baltimore-Washington, D.C. in 2010, and I had a good job based in Minneapolis, and they allowed me to switch to working remotely. This was perhaps my first remote job, which is interesting because I’ve done remote for more than a decade now! While it was a good job, I really wanted to get something in-person and local, making Baltimore my new local community.

RailsConf came to Baltimore in 2010 just after we arrived, and I was able to attend in 2010 and 2011. I also attended tech events a lot in this era, and I attended a Washington DC Bootstrap Maryland event, where I met LivingSocial co-founder Aaron Battalion, and other early or founding engineers like Patrick Joyce and Doug Ramsay. From that original introduction and more conversations, I was able to secure an interview and eventually a job at LivingSocial, which turned out to be a hockey stick, rocketship, whatever-big-growth-metaphor-you-prefer, a true high-growth company, built on a very straightforward idea and business model.

This was a formative experience, writing code while seeing huge engineer headcount growth, subdivision of work, teams, hack-a-thons, “vertical” areas of the business, acquisition of US and non-US based companies, the launch of an in-house incubator (Hungry Academy), to name a few things!

Although I hadn’t worked much with PostgreSQL at this stage, and we used MySQL at LivingSocial, this was my first exposure to a popular, global consumer brand, with big scale. While the business was centered mostly around deals delivered by email, the purchase experience was all on the web, and there were some crushing blows of traffic at times, including a particularly large one for the LivingSocial Superbowl ad in 2011!

OrderUp

OrderUp was a food delivery company started in Baltimore, back when food delivery was a hot trend. I met an engineer at LivingSocial, also based in Baltimore at the time, Paul Barry, who became the CTO at OrderUp. I was able to join that team for several years through the acquisition by Groupon! For OrderUp, food orders from customers were dispatched to active drivers.

A formative experience there was Paul rewriting the dispatcher code as a “wall of SQL.” Paul was very skilled with SQL queries, common table expression, query optimization, and administrative tasks like identifying problematic queries and canceling or terminating them. As an aside: those administration skills stuck with me, and made their way into the book!

I wasn’t very knowledgeable about this stuff at the time, but it was some of my first exposure to the importance of good control over your database operations when scaling up a business, and getting into the SQL, outside of common boundaries of the Active Record ORM.

Later when I arrived at Groupon, although I wrote Ruby on Rails as well, I primarily worked on Java service codebases (See: Microservice Frameworks for Java) to start.

The common part of the stack between all the client applications whether they were written in Ruby or Java, was PostgreSQL!

Flipgrid

The most formative experience for me with PostgreSQL, was a few years ago after joining Flipgrid. Flipgrid, later called Flip, was a video-based social learning platform used primarily in a K-12 educational setting. News broke very recently Flip is being shut down, sadly. Microsoft Flipgrid, or Flip, followed the acquisition of a startup company based in Minneapolis.

When the COVID pandemic happened, schools closed for in-person learning. The teachers needed a way to connect with their students online. Flipgrid experienced an explosion of growth as a result, as teachers sent and received Flipgrid videos with their students.

As measured by the New Relic APM, our main monolith Rails backend app received 450K requests per minute (RPM) at peak, powered by a single “beefy” PostgreSQL 10 instance running on AWS RDS (See: SaaS for Developers with Gwen Shapira — Postgres, Performance and Rails with Andrew Atkinson 🎙️).

We scaled the instance vertically, had read replicas, added connection pooling, but on a small team of back-end engineers with no DBA and no DB-focused engineer, we needed to dive into the database operations themselves, the schema design, index design, and maintenance operations, to unlock more scale, reliability, and predictability.

I took this challenge on, as it was a budding interest of mine, and there was an opportunity. I picked up High Performance PostgreSQL by Gregory Smith, and learned and applied as much as I could, as quickly as possible.

Later I learned it’s somewhat common to be an Accidental DBA, and that there’s such a thing as an Application DBA. I found a niche community!

Consulting and Coaching

Now that I’ve done PostgreSQL performance optimization, scalability, and reliability work a few times, I felt interested and qualified to try and earn a living out of doing this kind of work for more teams.

My hypothesis was that successful web product application teams can’t, or don’t want to hire a full-time DBA or DB-focused backend engineer, but for ones that are successful, that means there’s typically an unmet need for performance and optimization work and design guidance. Teams might “get by” until things blow up. That could take the form of timeouts, errors, disruptions to code releases, inability to upgrade instances, overloaded instances, replication problems, or myriad other issues.

My goal is for them to find me, and to make it easy for them to hire me, finding a good fit for price point, availability, engagement model, so we can partner up and solve their challenges.

In terms of income goals, my thought was if I had a few clients hiring me on a part-time basis, I could make full-time equivalent levels of income, while doing work I love, with more schedule flexibility, while avoiding some of the unpredictability of the tech industry.

This means I’m targeting teams in a middle area for size, they’re likely smaller and don’t have a databases team, but are big enough that their database size and transaction volume is fairly large, and they’re likely running into issues.

Family, mortgage, bills to pay

With that all said, unfortunately I’m not independently wealthy, and probably like you, I have bills to pay. Lifestyle creep is a thing. Since I’m launching a new consulting business, I would expect to not immediately make similar levels of income to a full-time equivalent salary, as I need to find clients, get engagements signed, perform the work, and collect payment.

Plus there are lots of new expenses to figure out, filing taxes, balancing income generating activities and future investments like podcast appearances or conference presentations.

That all means, this is all somewhat of an experiment, on a limited timeline. I can afford some risks for a while, but not forever!

Death by obscurity

My mentality is that one of the biggest risks to me consulting successfully, over the long haul, is that I’m a unknown person relative to the total addressable market size for my services.

For that reason, it’s critical to promote my book, ask people to buy it to support me, ask for references, promote myself on social media, for sales of my book, as it’s all part of long-term sustainability in continuing to make this career path work as an indie consultant.

The balancing act though with promoting myself through podcast appearances, conference presentations, and newsletters, is to continue to perform income-generating client work!

Industry churn

I feel fortunate to be side-stepping the stressful processes of interviewing for jobs, and avoiding some of the layoffs still occuring in the tech industry now.

On the indie path, I’m my own boss. It’s on me to manage my time, current client committments, and investments in my future success. Nothing is really given, and I need to find new clients, earn their trust, and deliver on my value prop they signed up for.

Fortunately, I really like writing, and have been successful in finding new clients, that I have the chance for indie consulting to be a long-term sustainable path!

Presenting at PostgreSQL Events

My big break as a first-time PostgreSQL presenter, was PGConf Conference NYC 2021. I presented on the work I did, along with the team, at Flipgrid, as we worked on optimizing all aspects of our PostgreSQL instances and database-usage from the application. My perspective was one of a practitioner, “accidental DBA,” and Application DBA.

Book Proposal

After PGConf NYC, an acquisitions editor reached out, said they’d seen that I’d presented, that I blog on PostgreSQL topics, and asked whether I’d considered writing a book about PostgreSQL. The publisher was looking to publish PostgreSQL books, as it was growing in popularity. Over the next few months, I submitted proposals to that published and ultimately successfully matched with The Pragmatic Programmers!

A hands-on book

Going all the way back to RailsSpace, I wanted to write a hands-on book with examples and exercises. Although the book has a narrative style, the emphasis is on code examples and exercises when possible, and getting the reader working on their own computers, so they can develop the skills and confidence to solve future challenges they face.

Database book for application developers

I felt there was a need for a database book for Rails developers, and more generally, web application developers. Database books might typically be written more for a reader with a background in systems administration or infrastructure, who is not writing backend application code.

Being an expert

Do book authors needs to be experts in the topic? For me, expert is a complicated word.

In the podcast, I talked about how when writing a book and looking at a topic, there’s a continuum of skill levels for readers. When targeting a skill level on that continuum, we can expand the coverage of the topic left or right.

For example, we cover Active Record topics, and PostgreSQL features like exclusion constraints or domains, but connect them under the umbrella of data quality, consistency, and integrity checks. There isn’t a single correct answer, but different trade-offs.

Authors Noel Rappin, Vladimir Dymentev, and myself recently were on stage at RailsConf (See: RailsConf 2024 Conference — The Long Goodbye), discussing our thoughts on writing tech books with publishers, for an audience of 20 or so, as part of a “Meet the Authors” event.

Noel described how being too much of an expert might mean the author can no longer relate to the learner, their struggles, and their perspectives. The suggestion was for tech book authors to have a kernel of expertise, but that it’s not necessary to be the world’s leading export to write a book.

Of course, we still want to write a book that’s free of technical errors, is useful, and enjoyable to read.

On promoting the book

In my experience, most of the promotion of my book has been left up to me. The publisher provides a lot of services as part of the package, and those costs are shared by the publisher and author.

Those services include having a developmental editor reading the book and providing feedback, helping get technical reviewer feedback organized and incorporated, and in the end stages, copy editing and layout. The publisher also has a distribution network so that the book appears for sale on Amazon, Barnes & Noble, as well as retailers like Target and Wal-Mart, and a zillion smaller book shops.

Promotion by way of conference presentations

Spring 2024 for me has been the greatest effort (and results for that effort) I’ve ever put in to submitting proposals to present at conferences.

It helps that I enjoy attending conferences, the travel part, meeting people, learning, and expanding my network. However, as an author and consultant, I’ve realized the conference presentations now are one of the best ways to help make my name, products, and services, less obscure.

Regularly presenting also helps with confidence. After the RailsConf workshop I gave, I was reflecting on how I had so little nerves, which is a long way from where I came from.

In the past I’d be excited to present, but more anxious. I’d prepare probably to an excessive degree (beyond where it’s necessary to improve the quality), then before the “performance” itself I’d be using deep breathing, and sweating it out to cope.

I’ve become much more comfortable speaking publicly on Postgres topics now, which for me has been this whole mix of things, experience, writing, and simply experience giving more talks.

Consulting offerings

My two main offerings now (always changing), are Consulting with longer engagements, with more time committed upfront. This represents a bigger (but predictable) cost for a company to make that kind of investment, but covers my time to partner up with them.

My other offering is Coaching and Advisory sessions, which are low cost, and could be a good fit for smaller companies, solo founders, or companies wanting to try out work with me a bit.

Episode

Episode Link

🎧 Andrew Atkinson - The Postgres Specialist

Let’s Connect

Thanks for reading!

🎙️ Ship It! Podcast — PostgreSQL with Andrew Atkinson

2024-05-21T00:00:00+00:00

Recently I joined Justin Garrison and Autumn Nash for episode “FROM guests SELECT Andrew” of Ship It!, a Changelog podcast.

We had a great conversation! I made bullet point notes from the episode, and added extra details.

Let’s get into it.

PostgreSQL Community

Autumn shared that she met Henrietta Dombrovskaya, who is an author, DBA, and who organizes Chicago PostgreSQL meetup and the PgDay Chicago conference. This was fun to hear since Henrietta has become a friend. Check out my 2023 coverage of PgDay Chicago.
Justin wasn’t familiar with the Postgres community. I was glad to share that the community events I’ve attended and people I’ve met at them, have been great!
Justin talked about career goals of trading off less money, for greater happiness. Autumn talked about how remote work provides the opportunity to get to know neighbors and your local community.
We talked about the longevity of PostgreSQL as an open source project, and the benefits of not being lead by a single big entity hat might be quarterly-profit oriented. Hopefully there’s no license “rug pull” in the future. Core team member Jonathan Katz wrote about this topic in Will PostgreSQL ever change its license?
I shared that PostgreSQL leaders, contributors, and committers attend community events, and it’s been fun to meet some of them.
Autumn mentioned Henrietta helped give away tickets to Milspouse Coders (Military Spouse Coders) for PgDay Chicago and that was greatly appreciated.
Autumn appreciated the explicit goal to bring more women to the Postgres community and PgDay Chicago event.
Autumn shared how seeing women at Postgres events (Check out a list of PostgreSQL community events) is important. Representation matters.
I shared some prominent women in the Postgres community I’ve met: Melanie Plageman, recently by becoming a core committer to PostgreSQL, Lætitia Avrot, Karen Jex, Elizabeth Garret Christensen, Stacey Haysler, Chelsea Dole, Selena Flannery, Ifat Ribon, Gabrielle Roth, are a few that come to mind!

Picking PostgreSQL and optimal designs

PostgreSQL has support for storing and indexing JSON data, which offers an alternative to MongoDB or DocumentDB NoSQL alternatives
I mentioned PgAnalyze and how founder Lukas Fittl is prominent in the community. Had the chance to catch up with Lukas at PgDay Chicago 2024.
When would we not use Postgres? If we wanted to scale beyond a single instance for writes or reads, Citus is a distributed Postgres option, which offers both row-based and schema-based sharding across multiple nodes. I’ll be presenting on Citus and related topics for SaaS on Rails on PostgreSQL at the virtual conference POSETTE: An Event for Postgres 2024.
Brief discussion of specialized vector databases versus the extensibility of PostgreSQL, and using pgvector.
For single instance reliability and availability, we can leverage physical and logical replication, to keep multiple replicas around that can be promoted to take over the role of the primary writer instance
Some modern commercial, hosted Postgres offerings, are building “compute and storage separation,” which can greatly reduce or effectively eliminate the concern of “replica lag.” Replica lag is a factor for “read after write” consistency, when we’re separating writes and reads across instances.

Towards the second half, we dove into a variety of different topics.

Andrew mentioned the Rideshare app from the book is available publicly on GitHub, no book purchase required.
There are dozens of companies building on PostgreSQL like Yugabyte, Timescale, and Hadoop, to name a few. Read about more things built on PostgreSQL.
SQL is a fundamental skill that’s worth learning and improving
Justin point out that whether you’re using a hosting provider or on premises, when it’s data you’re responsible for, it’s critical to protect it
Autumn pointed out we should “be nice” to DBAs, in the context of companies choosing exiting the cloud, going back to “on prem,” and needing more engineering skills from systems administrators and DBAs, which is what we had before the prominence of the cloud.
Justin pointed out that if you’re going on prem, you gotta pay people. For example, AWS runs “on prem,” there are people behind the scenes helping making it all work.

Performance and Cost Savings

Autumn pointed out for OLTP SQL work, and schema design, whether on prem or hosted, if we can really understand query planning, how we put data into and get data out of our databases, we can save millions of dollars.
Justin asked about the PostgreSQL query planner, whether we’ve got something like “flamegraphs” or distributed tracing. Andrew said that we’ve got something like that (although not as visual), but we’ve got the query plan break down using EXPLAIN, where we can see how much time is spent in storage access and filtering and similar operations, and what their costs are.

Outro

Justin mentioned he likes Terminal User Interface (TUI) programs, and maintains the awesome-tuis repository.
Justin mentioned he maintains Awesome Tmux. I’m a daily Tmux user, and learned it from Brian Hogan’s book tmux2: Productive Mouse-Free Development, which I recently learned has a new version coming soon.

Corrections

Bluesky was initially built in PostgreSQL, which was mentioned towards the episode end. Apparently since then, the Bluesky team has moved to ScyllaDB and SQLite.

Wrap Up

Justin and Autumn were great hosts, and I felt very comfortable as a guest and had a fun conversation connecting on tech and also as parents in tech!

Thanks for reading and listening, and get in touch with any questions or comments!

Listen to the Episode

Podcast

👉 FROM guests SELECT Andrew

✂️ Use Cases for Merging and Splitting Partitions With Minimal Locking in PostgreSQL 17

2024-04-16T00:00:00+00:00

This post looks at some interesting new capabilities managing Partitioned Tables coming in PostgreSQL 17, expected for release Fall 2024. The current major version is 16.

Current Table Partition Commands

Prior to Version 17, workflow options for partition management are limited to creating, attaching, and detaching partitions.

Once we’ve designed our partition structure, we couldn’t redesign it in place.

This applies to all partition types, whether we’re using RANGE, LIST, or HASH.

To combine multiple partitions into a single one, or to “subdivide” a single partition into multiples, we’d need to design a new structure then migrate all data rows to it. That’s a lot of steps!

What’s New?

From version 17, we have more options. We can now perform a SPLIT PARTITION operation on an existing singular partition, into two or more new ones.

If we wish to do the reverse, we’ve got that option as well. Starting from two or more partitions, we can perform a MERGE PARTITIONS (plural) operation to combine them into one.

An aside: don’t confuse this with the upsert-like SQL MERGE command which uses the same “merge” verb (oof)!

The new DDL commands are:

Here are tweets from Nori Shinoda that link to PostgreSQL git commits:

Ability to merge two existing partitions: https://twitter.com/nori_shinoda/status/1776841440167121057
Ability to split a partition into two or more: https://twitter.com/nori_shinoda/status/1776865005704499331

Let’s test the new commands out and think about use cases for them.

We’ll need a way to run pre-release PostgreSQL 17. Fortunately, I’ve recently compiled PostgreSQL from source code on my macOS laptop, and will use that instance and some test tables within the default postgres DB.

I use Postgres.app to run PostgreSQL 16 for most of my local development, and instances on different ports. I stop the instance on port 5432 so I can start up the one based on the compiled source code like this:

/usr/local/pgsql/bin/pg_ctl \
    -D /usr/local/pgsql/data \
    -l logfile \
    start

waiting for server to start.... done
server started

Terminology Notes

Previously I wrote about table partitioning in a two-part post. Here’s the first post: PostgreSQL Table Partitioning — Growing the Practice — Part 1 of 2 if you’d like a refresher on general information.

I’ll use “children” below referring to “partitions” of a partitioned table. The top-most table has the partition constraints, and tables that correspond to those constraints are added as “children”.

The term “children” can also show up when discussing foreign keys, when foreign key columns refer to the primary key of another table. The table with a foreign key can be said to be a child table.

For this post, “children” refers solely to partitions “of” (the PARTITION OF syntax below) a parent table.

What do these look like?

Imagine we had a multi-tenant application where tenants are identified as “accounts.” Tables with mixed account data have an account_id column with a value that identifies their account, meaning we can partition on it using LIST partitioning.

We’d normally have one account per customer. However, in my real-world working experience at startup companies, this isn’t always the case.

At a past employer with a B2B SaaS, an account was created to demo to a customer prospect.

When the customer joined the platform many months later, a new account was created for them, creating a situation where they had primary data under two accounts. There were also primary key conflicts, so we couldn’t easily combine the data without some manual efforts, but that’s a different story.

If this table had been created as a partitioned table with LIST partitioning, we could identify all rows by their account_id, and each would have its own partition.

On PostgreSQL 17 in that scenario, we could leverage MERGE PARTITIONS to combine those partitions that we would to group under one account, choosing one or the other.

What would that look like?

Merging Partitions

The table below has an account_id and no real data columns, since we’re just looking to demo the partition management aspect.

The id uses a generated sequence value, which means each row will have a unique value across partitions.

CREATE TABLE t (
  id INT GENERATED ALWAYS AS IDENTITY,
  account_id INT NOT NULL
) PARTITION BY LIST (account_id);

Imagine we have the following two partitions for account_id 1 and account_id 2.

CREATE TABLE t_account_1 PARTITION OF t FOR VALUES IN (1);
CREATE TABLE t_account_2 PARTITION OF t FOR VALUES IN (2);

Let’s insert 10 records for account_id 1, and 100 records for account_id 2. We have 110 records total, but they’re split across two partitions. We want to merge these together.

INSERT INTO t (account_id) SELECT 1 FROM GENERATE_SERIES(1,10);
INSERT INTO t (account_id) SELECT 2 FROM GENERATE_SERIES(1,100);

Now we want to merge them together using MERGE PARTITIONS:

ALTER TABLE t
MERGE PARTITIONS (t_account_1, t_account_2)
INTO t_account_1_2;

Cool. That combined t_account_1 and t_account_1 into a single partition called t_account_1_2 with 110 records.

What about splitting partitions? How does that work?

Splitting Partitions

We’ve seen how to merge partitions. We can also split partitions using the new SPLIT PARTITIONS command.

For this example let’s use the RANGE partitioning type.

Imagine that we had decided to create partitions for one week’s worth of data for an “events” style table that receives a lot of records. We’ll call the table t_events below.

We’ve decided with a one week boundary, the tables are large and unwieldy. We’d like to move to daily partitions so that the table for a day’s worth of data is smaller and more manageable.

Let’s look at the SQL commands for how we might achieve that.

Split Partitions Events Table

Create the t_events table using the RANGE partitioning type, initially with weekly partitions to demonstrate the current configuration.

CREATE TABLE t_events (
  id INT GENERATED ALWAYS AS IDENTITY,
  event_at TIMESTAMP WITHOUT TIME ZONE NOT NULL
) PARTITION BY RANGE (event_at);

Here are partitions for “last week,” “this week,” and “next week.”

CREATE TABLE t_events_last_week PARTITION OF t_events
FOR VALUES FROM ('2024-04-08 00:00:00') TO ('2024-04-15 00:00:00');

CREATE TABLE t_events_this_week PARTITION OF t_events
FOR VALUES FROM ('2024-04-15 00:00:00') TO ('2024-04-22 00:00:00');

CREATE TABLE t_events_next_week PARTITION OF t_events
FOR VALUES FROM ('2024-04-22 00:00:00') TO ('2024-04-29 00:00:00');

Now we’d like to take “next week’s partition” called t_events_next_week, and divide it into 7 daily partitions, one for each day.

Since it’s an upcoming week, we’ll assume it has no data in it, but is a pre-created partition.

When designing your own change like this, keep in mind the resulting boundaries you come up with must have equivalent start and end boundaries to the current configuration.

If the boundaries are off, you’ll get an error like this:

ERROR:  partition bound for relation "t_events_next_week" is null

Here’s the SPLIT PARTITION DDL command to split the single week command, into 7 daily partitions:

ALTER TABLE t_events SPLIT PARTITION t_events_next_week INTO (
  PARTITION t_events_day_1 FOR VALUES FROM ('2024-04-22 00:00:00') TO ('2024-04-23 00:00:00'),
  PARTITION t_events_day_2 FOR VALUES FROM ('2024-04-23 00:00:00') TO ('2024-04-24 00:00:00'),
  PARTITION t_events_day_3 FOR VALUES FROM ('2024-04-24 00:00:00') TO ('2024-04-25 00:00:00'),
  PARTITION t_events_day_4 FOR VALUES FROM ('2024-04-25 00:00:00') TO ('2024-04-26 00:00:00'),
  PARTITION t_events_day_5 FOR VALUES FROM ('2024-04-26 00:00:00') TO ('2024-04-27 00:00:00'),
  PARTITION t_events_day_6 FOR VALUES FROM ('2024-04-27 00:00:00') TO ('2024-04-28 00:00:00'),
  PARTITION t_events_day_7 FOR VALUES FROM ('2024-04-28 00:00:00') TO ('2024-04-29 00:00:00')
);

Nice. If we run \d+ t_events to describe t_events, we’ll see the two remaining weekly partitions, and the new 7 daily partitions.

There’s a catch. Performing this operation requires a lock on the parent table, which could be a long lock.

Is there a workaround?

Detach, Split, Reattach

As long as the structure of the table stays the same, partitions can be detached and reattached.

Those operations can both be performed in a non-blocking way by using CONCURRENTLY.

Unfortunately we can’t perform a SPLIT PARTITION CONCURRENTLY, which would make this even more convenient because we wouldn’t be worried about blocking writes while the exclusive lock was in effect. Perhaps we’ll get that in a future version of PostgreSQL.

Let’s consider a workaround. We know that we can detach partitions, split them while detached, then re-attach them. Would that work?

This is a lot of operations, and requires a “new fake parent” (my own name below) to work, so these steps should be considered more a proof of concept, not a recommendation. The goal is to avoid a potentially long lock blocking writes, by allowing the lock to occur on a detached table hierarchy. Essentially “offline.”

This was my idea when I first saw these new capabilities and the required access exclusive lock they acquire:

MERGE PARTITIONS is cool! But exclusive lock on parent is limiting. Workaround idea: concurrent detachment of two, then merge, then “reattach concurrently” on consolidated partition? cc @andrewkane @keithf4 @brandur @nori_shinoda https://t.co/fF9nJEL9ip
— Andrew Atkinson (@andatki) April 7, 2024

Trying to run SPLIT PARTITION on a detached partition with no parent doesn’t work. However, we can add a “new fake parent” table to stand-in temporarily.

Here’s the detach operation:

ALTER TABLE t_events
DETACH PARTITION t_events_next_week CONCURRENTLY;

Here’s the “fake” stand-in parent table definition. Once we’ve created this, we need to attach our detached partitions to it in order to perform the split.

We’ll only use the “fake parent” table for the split operation. When that’s done, we’ll detach the partitions again, and then re-attach them to the original parent CONCURRENTLY. At that point we can drop the “fake” parent.

CREATE TABLE t_events_fake_new (
  id INT GENERATED ALWAYS AS IDENTITY,
  event_at TIMESTAMP WITHOUT TIME ZONE NOT NULL
) PARTITION BY RANGE (event_at);

Running the SPLIT PARTITION on a separate parent avoids a long lock on the original parent, since it’s a completely separate table.

ALTER TABLE t_events_fake_new SPLIT PARTITION t_events_next_week INTO (
  PARTITION t_events_day_1 FOR VALUES FROM ('2024-04-22 00:00:00') TO ('2024-04-23 00:00:00'),
  PARTITION t_events_day_2 FOR VALUES FROM ('2024-04-23 00:00:00') TO ('2024-04-24 00:00:00'),
  PARTITION t_events_day_3 FOR VALUES FROM ('2024-04-24 00:00:00') TO ('2024-04-25 00:00:00'),
  PARTITION t_events_day_4 FOR VALUES FROM ('2024-04-25 00:00:00') TO ('2024-04-26 00:00:00'),
  PARTITION t_events_day_5 FOR VALUES FROM ('2024-04-26 00:00:00') TO ('2024-04-27 00:00:00'),
  PARTITION t_events_day_6 FOR VALUES FROM ('2024-04-27 00:00:00') TO ('2024-04-28 00:00:00'),
  PARTITION t_events_day_7 FOR VALUES FROM ('2024-04-28 00:00:00') TO ('2024-04-29 00:00:00')
);

Since the table structures have not changed, and since we’re not introducing any overlapping partition constraints, we can reattach to the original parent.

Alternatives

What about simply creating new partitions to move data rows into?

While it might be less work to create new partitions and move data rows, we couldn’t introduce new partitions that overlap with the boundaries/constraints of any existing one. PostgreSQL enforces this and would prevent the partition creation.

To avoid the overlap limitation, SPLIT PARTITION seems necessary when our goal is to modify a structure in-place like this.

However, in a similar way to the workaround above, we could follow the same tactic and detach the overlapping partition to work around the conflict.

With that approach, we might achieve the same end result and not need the SPLIT PARTITION command.

What are your thoughts?

Resources and Thank You

Here are some people to thank, and more resources on these new commands to check out.

Nori Shinoda for posting the latest and greatest from PostgreSQL source code
PostgreSQL 17: Split and Merge partitions by Daniel Westermann
Creston Jamison for covering the split and merge partition commands in Scaling Postgres #311

I’m curious to see how merging and splitting partitions commands get used in the wild.

In the future, if we can perform these operations CONCURRENTLY, they will be even more useful. The introduction of these features may be a step towards modifying an unpartitioned table in place.

Having something like SPLIT TABLE CONCURRENTLY would be very nice for tables that weren’t originally partitioned, became huge, and to lessen the work needed to migrate their data into more manageable partitions.

🎙️ Hacking Postgres 🐘 Podcast — Season 2, Ep. 1 Andrew Atkinson

2024-04-15T00:00:00+00:00

Recently I joined Ry Walker, CEO of Tembo, as a guest on the Hacking Postgres podcast.

Hacking Postgres has had a lot of great Postgres contributors as guests on the show, so I was honored to be a part of it being that my contributions are more in the form of developer education and advocacy.

Ry asked me about when I got started with PostgreSQL and what my role looks like today.

Hacking Postgres Season 2, Ep. 1 - Andrew Atkinson

PostgreSQL Origin

Ry has also been a Ruby on Rails programmer, so that was a fun background we shared. We both started on early versions of Ruby on Rails in the 2000s, and were also early users of Heroku in the late 2000s.

Since PostgreSQL was the default DB for Rails apps deployed on Heroku, for many Rails programmers it was the first time they used PostgreSQL. Heroku valued the fit and finish of their hosted platform offering, and provided best in class documentation and developer experience as a cutting edge platform as a service (PaaS). The popularity of that platform helped grow the use of PostgreSQL amongst Rails programmers even beyond Heroku.

For me, Heroku was where I really started using PostgreSQL and learning about some of the performance optimization tactics “basics” as a web app developer.

Meeting The Tembo Team

Besides Ry, I’ve also had the chance to meet more folks from Tembo. Adam Hendel is a founding engineer and also based here in Minnesota. I also met Samay Sharma, PostgreSQL contributor and now CTO of Tembo, at PGConf NYC 2023 last Fall. While not an employee or affiliated with the company at all, it’s been interesting to track what they’re up to, and get little glimpses into starting up a whole company that’s focused on leveraging the power and extensibility of PostgreSQL.

If you’d like to learn more about Adam’s background, Adam was the guest for Season 1, Episode 2 of Hacking Postgres, which you can find here: https://tembo.io/blog/hacking-postgres-ep2

Using PostgreSQL with Ruby on Rails Apps

Ruby on Rails as a web development framework has great support via the ORM - Active Record - for basic and advanced Postgres features.

There’s support for composite primary keys (CPK), common table expressions (CTE), and if you don’t like the SQL that Active Record generates, you can always write your own as query text within strings, binding parameters as needed. If your work is scaling up, Active Record helps by offering writer and role separation, and the ability to run copies of your DB via Horizontal Sharding.

There’s even a page dedicated to PostgreSQL support by Active Record on the official Ruby on Rails documentation here: https://guides.rubyonrails.org/active_record_postgresql.html

Looking at things the other way around, from the perspective of PostgreSQL, Ruby on Rails is “just another client application.” We might see some non-ideal patterns as client requests like N+1 queries, overly broad queries without restrictions on columns, rows, the number of tables joined etc., but I’d argue most of those things aren’t specific to Active Record as they are more of a shortcoming of application developers having limited understanding of how their queries are planned and executed. That’s something I’m hoping to help improve!

The Ruby on Rails app the book uses is called Rideshare and is here: https://github.com/andyatkinson/rideshare. Within the source code, besides the Ruby code, you’ll see a lot of sample PostgreSQL files like .pgpass, pg_hba.conf, pgbouncer configuration, and these are all used in examples and exercises in the book. You’ll also see a couple of Docker instances that get provisioned and connected to each other, as readers work through examples and exercises setting up physical and logical replication, then configuring it with Active Record.

I’m pretty sure this is the only book of its kind that goes into as much depth both with PostgreSQL and Active Record!

Hacking Postgres Podcast

There have been a lot of great episodes on the podcast.

Marco Slot was the overall first guest, Season 1, Episode 1. I remember the episode coming out around the time of PGConf NYC 2023.

Marco is the creator of the pg_cron https://github.com/citusdata/pg_cron extension which I’ve used professionally, and included in examples in Rideshare for the book.

Philippe Noël, CEO of ParadeDB, Season 1, Episode 8: https://tembo.io/blog/hacking-postgres-ep8, pg_bm25 for Elasticsearch-like search in Postgres. https://blog.paradedb.com/pages/introducing_search

Recently I listened to this episode with Burak Yucesoy of Ubicloud. Burak has worked on various extensions too like postgres-hll, “high cardinality estimates” using the HyperLogLog data structure. This extension is also mentioned in the book.

I also enjoyed the episode with Bertrand Drouvot as the guest: https://tembo.io/blog/hacking-postgres-ep9. Bertrand covered some of these items:

pgsentinel https://github.com/pgsentinel/pgsentinel
explain.dalibo.com for plan visualization. https://explain.dalibo.com/
pg_directpaths https://github.com/bdrouvot/pg_directpaths with some super speed inserts, even much faster than inserting into unlogged tables!

I like the ideas Bertrand shared for more observability that’s useful for Postgres DBAs:

For a long running queries, seeing which parts are being processed. For example, which part of the processing is happening, buffers access? Filtering? Sorting? For OLTP we typically have short queries, but even then they can go long and appear to be stuck.
For a normalized query from pg_stat_statements, the ability to see the query plans that were for that query. It would be interesting to look back and see whether a bad plan popped in at some point.

More Podcast Recommendations

Ry likes to ask about podcasts the guest recommends. Here’s a collection of recent podcast episodes or podcasts I’d recommend:

postgres.fm is a favorite! I appeared recently as a guest on Rails + Postgres https://postgres.fm/episodes/rails-postgres
Scaling Postgres with Creston Jamison https://www.scalingpostgres.com/
NetApp OnTech https://twitter.com/andatki/status/1776459512687231352
Ruby For All https://twitter.com/andatki/status/1776392158674821288
YAGNI https://twitter.com/andatki/status/1776391049205927953

“Just Use Postgres”

Ry and I briefly touched on “database sprawl,” which is something I’ve seen in the wild. The last chapter of my book addresses this topic, bringing a lot of things together the reader has learned from earlier chapters, with the goal of using PostgreSQL for more types of work.

For example, Redis is a very popular second database in the Ruby on Rails community. Commonly, Redis is used to write and read cache data, background job or message queue data that’s small and transient, or for storing other small bits of data like user session data.

While Redis works well for those things, operating a Redis instance or cluster does carry more operational cost for the team, as it’s another piece of infrastructure to provision, patch, upgrade, and observe. What if we used Postgres for those things instead?

We explore specific tactics for doing that with use cases like:

Background jobs without Redis
Full text search within PostgreSQL, tsquery, tsvector
Caching without Redis
Vector similarity search

Resources

Here’s the Hacking Postgres episode video: https://www.youtube.com/watch?v=CAbGPydw_NY

The tweet is embedded below.

Hacking Postgres Season 2 is here!

We're releasing new episodes every Thursday through the end of May, so stay tuned for more great Postgres content!

First up, we've got Andrew Atkinson (@andatki) a Software Engineer who specializes in building high-performance web applications… pic.twitter.com/q22nc8WQg1
— Tembo - Multi-Workload Postgres (@tembo_io) April 5, 2024

Wrapping Up and Thank You

Hacking Postgres with Ry was a good time! I’m glad the Tembo team is offering Postgres as a service in new ways, by customizing it with curated sets of extensions as various stacks, providing an extension registry, and contributing to the greater PostgreSQL ecosystem. Having more choices benefits developers, providing new solutions for long-standing challenges.

I recommend the “Hacking Postgres” podcast as a great way to get to know some of the PostgreSQL contributor community, and tech innovations in the greater ecosystem.

Thank you to Ry for hosting and interviewing me, Adam for recommending me, and Jonathan and the production team behind the scenes for your support in the process.

Compiling PostgreSQL on macOS To Test Documentation and Patches

2024-04-09T00:00:00+00:00

This post covers my experience compiling and installing PostgreSQL from source code. Primarily I followed official instructions and this blog post Setup PostgreSQL development environment on MacOS. Once installed, we’ll look at how to test doc changes and patches from the mailing list.

Introduction

What are we trying to do? We’re trying to compile and install the PostgreSQL database on macOS, then run it with a database in order to test functionality changes or test documentation patches. With that in place, we can even make contributions to PostgreSQL using the mailing list process.

Background

The PostgreSQL documentation has a chapter called Chapter 17. Installation from Source Code that’s worth reading through. This section covers various platforms.

There’s a section called “Building and Installation with Autoconf and Make” that I’ll follow here. Using Autoconf and make are considered the “old way” to build from source, and the new way is to use Meson. A future post may cover Meson, as well as additional ways to compile PostgreSQL, but in this post we’ll use the Autoconf and Make method.

Mac macOS Machine

Here are the details of the machine I’m working on in April 2024:

macOS Sonoma 14.4
Homebrew
Fish shell

Homebrew is a popular method of getting software for macOS via the “formulae” that it publishes. Fish shell is my preferred shell environment, but is not as popular as Zsh or Bash, however there are still post-install instructions for shell configuration from Homebrew.

Short version

In the PostgreSQL documentation, they have a short and long version which is a pattern I’ll follow here. Here’s the short version:

I cloned from postgres/postgres on GitHub which is a mirror of the source code. I can also push my own branches to my own form on GitHub if I want.
When I want to recompile PostgreSQL, I run git pull to get the latest source code versions from upstream. I’m often seeing cool new things coming from Noriyoshi Shinoda who posts new commits on Twitter. I may also check the pgsql-hackers email list, although the volume is so high that it’s difficult to check in on and get a lot of value from.

With that said, now that we have the source code, let’s make sure the machine is prepared. First we’ll need to install build dependencies. I’ll do that with Homebrew.

brew install icu4c
brew install pkgconfig

Next I’ll follow post-install instructions to set up my shell, including creating environment variables.

fish_add_path /opt/homebrew/opt/icu4c/bin
fish_add_path /opt/homebrew/opt/icu4c/sbin
set -gx LDFLAGS "-L/opt/homebrew/opt/icu4c/lib"
set -gx CPPFLAGS "-I/opt/homebrew/opt/icu4c/include"

With that in place, we’re ready to compile PostgreSQL:

cd postgres
./configure
make && make install

This is the simple version, and your experience may not be so simple. You may run into compilation issues. Unfortunately this post isn’t meant to try and solve compilation issues, however I will note some issues I ran into below. Please leave a comment if you’d like, although to debug your compilation issues, I’d recommend searching on Stack Overflow or getting more information about the error using something like ChatGPT.

Issues

Having recently overhauled my macOS setup, making sure to install the ARM architecture versions and to not use Rosetta at all, and after having recently updated to 14.4, I had to reinstall a number of things almost as if the machine was brand new. I completely reinstalled the command line tools for macOS, Homebrew, and all the formulae I had before that.

To start over with command line tools, I did this:

sudo rm -rf /Library/Developer/CommandLineTools
xcode-select --install

Once that was done, I ran make clean in PostgreSQL due to issues I had with a “CPU mismatch.” After reinstalling the command line tools and the Homebrew formulae I was back in business.

Long Version

Before I knew about the icu4c program, I saw issues like this when running ./configure:

configure: error: ICU library not found

Once I’d installed icu4c, and pkgconfig, I needed to refer back to their post-install instructions. I used brew info with the formula name to get those post-install instructions and confirm the formula was installed. For example running brew info icu4c prints out the instructions below.

This will provide shell-specific instructions to add icu4c to your PATH. Here’s what it prints for me with Fish shell:

If you need to have icu4c first in your PATH, run:
  fish_add_path /opt/homebrew/opt/icu4c/bin
  fish_add_path /opt/homebrew/opt/icu4c/sbin

Set environment variables LDFLAGS, CPPFLAGS, PKG_CONFIG_PATH for icu4c and pkgconfig in your shell. These are the instructions that are printed for Fish shell:

For compilers to find icu4c you may need to set:
  set -gx LDFLAGS "-L/opt/homebrew/opt/icu4c/lib"
  set -gx CPPFLAGS "-I/opt/homebrew/opt/icu4c/include"

For pkg-config to find icu4c you may need to set:
  set -gx PKG_CONFIG_PATH "/opt/homebrew/opt/icu4c/lib/pkgconfig"

After installing those my machine was prepared, and ./configure ran successfully.

Post-compilation of PostgreSQL, Starting It Up

With PostgreSQL compiled, I was ready to create and initialize the data directory.

First we create a directory with this command:

mkdir -p /usr/local/pgsql/data

Next, the instructions from PostgreSQL refer to the adduser program to create a postgres user as the owner. adduser doesn’t exist on macOS. While there are equivalents, I wanted to use my OS user andy to keep things simple for now.

Since I did that, I needed to change the ownership of the data directory to make andy the owner. I did that by running:

chown andy /usr/local/pgsql/data

Now that PostgreSQL was compiled, it was ready to initialize and start. I ran the initdb program included in PostgreSQL to initialize the data directory which is where all the database content is stored. The -D option below is the flag for the data directory, and the value is the absolute path to it.

/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data

With that initialized, we can start PostgreSQL. Use the pg_ctl program to start it up, again supplying the path to the data directory as an argument.

/usr/local/pgsql/bin/pg_ctl \
  -D /usr/local/pgsql/data \
  -l logfile start

Review the rest of the Short Version instructions here.

Now that we’ve got the latest source code version of PostgreSQL compiled, initialized, and running on the default port 5432, we’re ready to use it for testing, connecting to the built-in postgres database, or creating a new one on the instance.

Testing Documentation Changes

I read the PostgreSQL documentation a lot, and I’ve learned how to make contributions to it if I see something to propose. Some of those suggestions have been reviewed and committed by others into the project. Cool! This makes me feel like I’m part of the community of PostgreSQL. My experience in interacting on the list has been positive. There is a lot of meticulous attention paid to the details of the documentation, which makes sense, given the widespread usage, longevity, technical nature, criticality of the usage, and international audience.

While building docs isn’t strictly necessary to submit a patch, reviewing built HTML versions inspires confidence to see what users will see. This way I’m more confident when sending a documentation patch to pgsql-hackers that it will look like I expect.

Blogger, presenter, and DBA extraordinaire Lætitia Avrot has a post Patching Postgres’ Documentation that taught me and inspired me to contribute my first documentation patch. Thank you Lætitia! I’ve now contributed a few more and some have been reviewed and committed directly or have inspired a related commit. I’ve linked those here: #1, #2, #3

Testing Patches

Adam Hendel emailed the pgsql-hackers list about the addition of a --log-header flag for pgbench to add more information to the log file output.

Since I’d just compiled PostgreSQL before reading that email, and since I’ve gotten to know Adam Hendel a bit in the community, I was eager to help. I sprung into action, knowing I could add the patch to my local installation, re-compile, and test it out. I’m also familiar a bit with pgbench and have used it, and thought the gist of the change to the log file made sense. I think testing a patch and relaying one’s experience can help to move a patch forward.

To start, I made a ~/patches directory to store this particular patch file and future ones.

I copied the patch file into that directory. We’ll call it the-patch-file.patch below since this is an example:

mv the-patch-file.patch ~/patches

Then I applied it:

git apply ~/patches/the-patch-file.patch

With the patch in place, I was ready to compile PostgreSQL so that the newly built one had the patch included.

Once PostgreSQL was compiled again, I was ready to run pgbench and test the new behavior.

First I needed to find where the pgbench executable was running from since I needed to run my local installation version I’d just compiled. I found that and ran cd to go to that directory. Here’s where the program was for me since I keep my postgres source code in /Users/andy/Projects/postgres:

/Users/andy/Projects/postgres/src/bin/pgbench

I ran pgbench from that directory. The arguments are split onto their own new lines.

pgbench -i \
-d postgres src/bin/pgbench/pgbench postgres://andy:@localhost:5432/postgres \
--log --log-header

I removed the existing log file, since I wanted to test changes going into it:

rm pgbench_log.*

Now I could run the compiled version of pgbench like this, using the new --log-header flag:

src/bin/pgbench/pgbench postgres://andy:@localhost:5432/postgres --log --log-header
pgbench (17devel)

Finally, I could check the output of the log file to verify it had what I expected:

cat pgbench_log.*
client_id transaction_no time script_no time_epoch time_us
0 1 8435 0 1699902315 902700
0 2 1130 0 1699902315 903973

I was able to verify using the new flag and that the results were what Adam described. Nice!

What’s Next?

We’re just scratching the surface of setting up a local testing environment here. It would be nice to also run the full set of tests that are included in PostgreSQL. Tests are an important part of verifying functionality and getting patches accepted.

There are also a lot of programs included in the distribution of PostgreSQL such as pgbench, which may have separate testing requirements. When building PostgreSQL, there are loads of different flags, developer options, and environment variables that can be provided at configuration time that are worth reviewing.

Wrap Up

If you use macOS, you may find a little less support in general for compiling PostgreSQL, but fortunately the steps aren’t too complicated, and PostgreSQL supports compilation for this platform.

In future posts, we’ll look at more ways to run experimental versions of PostgreSQL on macOS.

Once you’ve compiled PostgreSQL or have an experimental version to test with, and now how to start it up and use the built in databases, create your own, or other included programs, you’ve got a great place to test unreleased documentation changes, functionality changes, and connect more significantly with the community and the project.

Thanks for reading!

Andrew Atkinson - Software Engineer, Author, High Performance PostgreSQL for Rails

Wait a minute! — PostgreSQL extension pg_wait_sampling

Knowledge and Observability

Real-time observability

Introducing pg_wait_sampling

Configuring pg_wait_sampling on macOS

Basic Usage of pg_wait_sampling

Customization

Cloud Support

Resources

Wrap Up

You make a good point! — PostgreSQL Savepoints

Dividing Up a Transaction

Savepoints

Commands

Reusing Savepoints

Errors

Wrap Up

SaaS on Rails on PostgreSQL — POSETTE 2024

💻 Slide Deck

🎥 YouTube Recording

Mastering PostgreSQL for Rails: An Interview with Andy Atkinson

Video Interview

🎙️ IndieRails Podcast — Andrew Atkinson - The Postgres Specialist

Early Career

Build a Blog in 15 Minutes

Train Brain

Spicy Chicken at Wendy’s

LivingSocial

OrderUp

Flipgrid

Consulting and Coaching

Family, mortgage, bills to pay

Death by obscurity

Industry churn

Presenting at PostgreSQL Events

Book Proposal

A hands-on book

Database book for application developers

Being an expert

On promoting the book

Promotion by way of conference presentations

Consulting offerings

Episode

Episode Link

Let’s Connect

Top Five PostgreSQL Surprises from Rails Devs

1. Covering Indexes

2. Viewing Pages Accessed

3. Ordering Topics

4. Enumerating Columns vs. SELECT *

5. Using PostgreSQL For More Types of Work

Wrapping Up

🎙️ Ship It! Podcast — PostgreSQL with Andrew Atkinson

PostgreSQL Community

Picking PostgreSQL and optimal designs

Performance and Cost Savings

Outro

Corrections

Wrap Up

Listen to the Episode

Podcast

✂️ Use Cases for Merging and Splitting Partitions With Minimal Locking in PostgreSQL 17

Current Table Partition Commands

What’s New?

Terminology Notes

What do these look like?

Merging Partitions

Splitting Partitions

Split Partitions Events Table

Detach, Split, Reattach

Alternatives

Resources and Thank You

🎙️ Hacking Postgres 🐘 Podcast — Season 2, Ep. 1 Andrew Atkinson

PostgreSQL Origin

Meeting The Tembo Team

Using PostgreSQL with Ruby on Rails Apps

Hacking Postgres Podcast

More Podcast Recommendations

“Just Use Postgres”