<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://andyatkinson.com/feed/by_tag/PostgreSQL.xml" rel="self" type="application/atom+xml" /><link href="https://andyatkinson.com/" rel="alternate" type="text/html" /><updated>2026-05-01T16:20:42+00:00</updated><id>https://andyatkinson.com/feed/by_tag/PostgreSQL.xml</id><title type="html">Software Engineer, Author, High Performance PostgreSQL for Rails</title><subtitle>Software Engineer, Author, Consultant</subtitle><author><name>Andrew Atkinson</name></author><entry><title type="html">What are SLRUs and MultiXacts in Postgres? What can go wrong?</title><link href="https://andyatkinson.com/postgresql-slru-multixact-what-can-go-wrong" rel="alternate" type="text/html" title="What are SLRUs and MultiXacts in Postgres? What can go wrong?" /><published>2025-09-25T11:15:00+00:00</published><updated>2025-09-25T11:15:00+00:00</updated><id>https://andyatkinson.com/postgresql-slru-multixact-what-can-go-wrong</id><content type="html" xml:base="https://andyatkinson.com/postgresql-slru-multixact-what-can-go-wrong"><![CDATA[<p>In this post we’ll cover two types of Postgres internals.</p>

<p>The first internal item is an “SLRU.” The acronym stands for “simple least recently used.” The LRU portion refers to caches and how they work, and SLRUs in Postgres are a collection of these caches.</p>

<p>SLRUs are small in-memory item stores. Since they need to persist across restarts, they’re also saved into files on disk. Alvaro<sup id="fnref:alvaro" role="doc-noteref"><a href="#fn:alvaro" class="footnote" rel="footnote">1</a></sup> calls SLRUs “poorly named” for a user-facing feature. If they’re internal, why are they worth knowing about as Postgres users?</p>

<p>They’re worth knowing about because there can be a couple of possible failure points with them, due their fixed size. We’ll look at those later in this post.</p>

<p>Before getting into that, let’s cover some basics about what they are and look at a specific type.</p>

<h2 id="main-purpose-of-slrus">Main purpose of SLRUs</h2>
<p>The main purpose of SLRUs is to track metadata about Postgres transactions.</p>

<p>SLRUs are a general mechanism used by multiple types. Like a lot of things in Postgres, the SLRU system is extensible which means extensions can create new types.</p>

<p>The “least recently used” aspect might be recognizable from cache systems. LRU refers to how the oldest items are evicted from the cache when it’s full, and newer items take their place.
This is because the cache has a fixed amount of space (measured in 8KB pages) and thus can only store a fixed amount of items.</p>

<p>Old SLRU cache items are periodically cleaned up by the Vacuum process.</p>

<h2 id="what-about-the-buffer-cache">What about the buffer cache?</h2>
<p>The buffer cache (sized by configuring <a href="https://www.postgresql.org/docs/current/runtime-config-resource.html">shared_buffers</a>) is another form of cache in Postgres. Thomas Munro proposed unifying the SLRUs and buffer cache mechanisms.</p>

<p>However, as of Postgres 17 and the upcoming 18 release (released September 9, 2025), SLRUs are still their own distinct type of cache.</p>

<p>What types of data is stored in SLRUs?</p>

<h2 id="what-type-of-data-is-tracked-in-slrus">What type of data is tracked in SLRUs?</h2>
<p>Transactions are a core concept for relational databases like Postgres. Transactions are abbreviated “Xact,” and Xacts are one of the types of data stored in SLRUs.</p>

<p>Besides regular transactions, there are variations of transactions. Transactions can be created inside other transactions, which are called “nested transactions.”</p>

<p>Whether parent or nested transactions, they each get their own 32-bit integer identifier once they begin modifying something, and these are all tracked while they’re in use. The <a href="https://www.postgresql.org/docs/current/sql-savepoint.html">SAVEPOINT</a> keyword (blog post: <a href="https://andyatkinson.com/blog/2024/07/22/postgresql-savepoints">You make a good point! — PostgreSQL Savepoints</a> saves the incremental status for a transaction.</p>

<p>Another variation of a transaction is a “multi-transaction,” (multiple transactions in a group) or “MultiXact” in Postgres speak.</p>

<h2 id="what-are-multixacts">What are MultiXacts?</h2>
<p>A MultiXact gets a separate number from the transaction identifier. I think of it like a “group” number. The group might be related to a table row, but each transaction in the group is doing something different. Think of multiple transactions all doing a foreign key referential integrity check on the same referenced primary key.</p>

<p>Here’s a definition of MultiXact IDs:</p>
<blockquote>
  <p>A MultiXact ID is a secondary data structure that tracks multiple transactions holding locks on the same row.</p>
</blockquote>

<p>When MultiXacts are created, their identifier is stored in tuple header info, replacing the transaction id that would normally be stored in the tuple header.</p>

<p>As this buttondown blog post (“Notes on some PostgreSQL implementation details”)<sup id="fnref:buttondown" role="doc-noteref"><a href="#fn:buttondown" class="footnote" rel="footnote">2</a></sup> describes, the tuple (row version) header has a small fixed size. The MultiXact id replaces the transaction id using the same size identifier (but a different one), to keep the tuple header size small (as opposed to adding another identifier).</p>

<p>Transaction IDs and MultiXact IDs are both represented as a unsigned 32-bit integer, meaning it’s possible to store a max of around ~4 billion values (See: <a href="https://www.postgresql.org/docs/current/transaction-id.html">Transactions and Identifiers</a>. We can get the current transaction id value by running <code class="language-plaintext highlighter-rouge">select pg_current_xact_id();</code>.</p>

<p>What do we mean by transaction metadata? One example is with nested transactions, the parent transaction, the “creator”.</p>

<p>If you’d like to read how AWS introduces MultiXacts, check out this post. This post describes them: What are MultiXacts?
<a href="https://aws.amazon.com/blogs/database/multixacts-in-postgresql-usage-side-effects-and-monitoring/">https://aws.amazon.com/blogs/database/multixacts-in-postgresql-usage-side-effects-and-monitoring/</a></p>

<p>When do MultiXacts get created?</p>

<h2 id="when-do-multixacts-get-created">When do MultiXacts get created?</h2>
<p>MultiXacts get created only for certain types of DML operations and for certain schema definitions. In other words, it’s possible that your particular Postgres database workload does not create MultiXacts at all, or it’s possible they’re heavily used.
Let’s look at what creates MultiXacts:</p>
<ul>
  <li>Foreign key constraint enforcement</li>
  <li><code class="language-plaintext highlighter-rouge">SELECT FOR SHARE</code></li>
</ul>

<p>If you use no foreign key constraints or your application (or ORM) never creates <code class="language-plaintext highlighter-rouge">SELECT FOR SHARE</code>, then your Postgres database may have no MultiXacts.</p>

<p>Let’s go back to SLRUs.</p>

<h2 id="more-about-slrus">More about SLRUs</h2>
<p>SLRUs have a fixed size (prior to Postgres 17) measured in pages. When items are evicted from the SLRU cache, a <a href="https://www.interdb.jp/pg/pgsql08/01.html">page replacement</a> occurs.</p>

<p>The page being replaced is called the “victim” page and Postgres must do a little work to find a victim page.
Since SLRUs survive Postgres restarts, they’re <a href="https://www.interdb.jp/pg/pgsql08/01.html://www.postgresql.org/docs/17/storage-file-layout.html#PGDATA-CONTENTS-TABLE">saved in files in the PGDATA directory</a>.</p>

<p>The directory name will depend on the SLRU type. For example for MultiXacts, the directory name is <code class="language-plaintext highlighter-rouge">pg_multixact</code>. SLRU buffer pages are written to the WAL and to disk, meaning that if the primary instance fails, the state can be recovered.</p>

<p>See the <code class="language-plaintext highlighter-rouge">slru.c</code> <code class="language-plaintext highlighter-rouge">SlruPhysicalWritePage</code> function comments which describes writing WAL and writing out data:</p>
<pre>
Honor the write-WAL-before-data rule, if appropriate, so that we do not
write out data before associated WAL records.
</pre>

<p>Each SLRU instance implements a circular buffer of pages in shared memory, evicting the least recently used pages. A circular buffer is another interesting Postgres internal concept but is beyond the scope of this post.
How can we observe what’s happening with SLRUs?</p>

<h2 id="using-pg_stat_slru">Using pg_stat_slru</h2>
<p>Since Postgres 13, we have the system view “pg_stat_slru” to query to inspect cumulative statistics about the SLRUs.
<a href="https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-SLRU-VIEW">https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-SLRU-VIEW</a>
To list only the names of the built-in SLRU types:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">select</span> <span class="n">name</span> <span class="k">from</span> <span class="n">pg_stat_slru</span><span class="p">;</span>
      <span class="n">name</span>
<span class="c1">-----------------</span>
 <span class="n">CommitTs</span>
 <span class="n">MultiXactMember</span>
 <span class="n">MultiXactOffset</span>
 <span class="k">Notify</span>
 <span class="nb">Serial</span>
 <span class="n">Subtrans</span>
 <span class="n">Xact</span>
 <span class="n">Other</span>
</code></pre></div></div>

<p>To determine if our system is creating MultiXact SLRUs, we can query the pg_stat_slru view. We’d see non-zero numbers in rows below when the system is creating SLRU data.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">select</span> <span class="n">name</span> <span class="k">from</span> <span class="n">pg_stat_slru</span><span class="p">;</span>
                     <span class="k">View</span> <span class="nv">"pg_catalog.pg_stat_slru"</span>
    <span class="k">Column</span>    <span class="o">|</span>           <span class="k">Type</span>           <span class="o">|</span> <span class="k">Collation</span> <span class="o">|</span> <span class="k">Nullable</span> <span class="o">|</span> <span class="k">Default</span> 
<span class="c1">--------------+--------------------------+-----------+----------+---------</span>
 <span class="n">name</span>         <span class="o">|</span> <span class="nb">text</span>                     <span class="o">|</span>           <span class="o">|</span>          <span class="o">|</span> 
 <span class="n">blks_zeroed</span>  <span class="o">|</span> <span class="nb">bigint</span>                   <span class="o">|</span>           <span class="o">|</span>          <span class="o">|</span> 
 <span class="n">blks_hit</span>     <span class="o">|</span> <span class="nb">bigint</span>                   <span class="o">|</span>           <span class="o">|</span>          <span class="o">|</span> 
 <span class="n">blks_read</span>    <span class="o">|</span> <span class="nb">bigint</span>                   <span class="o">|</span>           <span class="o">|</span>          <span class="o">|</span> 
 <span class="n">blks_written</span> <span class="o">|</span> <span class="nb">bigint</span>                   <span class="o">|</span>           <span class="o">|</span>          <span class="o">|</span> 
 <span class="n">blks_exists</span>  <span class="o">|</span> <span class="nb">bigint</span>                   <span class="o">|</span>           <span class="o">|</span>          <span class="o">|</span> 
 <span class="n">flushes</span>      <span class="o">|</span> <span class="nb">bigint</span>                   <span class="o">|</span>           <span class="o">|</span>          <span class="o">|</span> 
 <span class="n">truncates</span>    <span class="o">|</span> <span class="nb">bigint</span>                   <span class="o">|</span>           <span class="o">|</span>          <span class="o">|</span> 
 <span class="n">stats_reset</span>  <span class="o">|</span> <span class="nb">timestamp</span> <span class="k">with</span> <span class="nb">time</span> <span class="k">zone</span> <span class="o">|</span>           <span class="o">|</span>          <span class="o">|</span>
</code></pre></div></div>

<p>To look at the <code class="language-plaintext highlighter-rouge">pg_xact</code> SLRU:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">select</span> <span class="o">*</span> <span class="k">from</span> <span class="n">pg_stat_slru</span> <span class="k">where</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'Xact'</span><span class="p">;</span>
 <span class="n">name</span> <span class="o">|</span> <span class="n">blks_zeroed</span> <span class="o">|</span> <span class="n">blks_hit</span> <span class="o">|</span> <span class="n">blks_read</span> <span class="o">|</span> <span class="n">blks_written</span> <span class="o">|</span> <span class="n">blks_exists</span> <span class="o">|</span> <span class="n">flushes</span> <span class="o">|</span> <span class="n">truncates</span> <span class="o">|</span>          <span class="n">stats_reset</span>
<span class="c1">------+-------------+----------+-----------+--------------+-------------+---------+-----------+-------------------------------</span>
 <span class="n">Xact</span> <span class="o">|</span>         <span class="mi">460</span> <span class="o">|</span> <span class="mi">30686596</span> <span class="o">|</span>        <span class="mi">44</span> <span class="o">|</span>         <span class="mi">2030</span> <span class="o">|</span>           <span class="mi">0</span> <span class="o">|</span>    <span class="mi">1684</span> <span class="o">|</span>         <span class="mi">0</span> <span class="o">|</span> <span class="mi">2024</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">19</span> <span class="mi">09</span><span class="p">:</span><span class="mi">52</span><span class="p">:</span><span class="mi">33</span><span class="p">.</span><span class="mi">506794</span><span class="o">-</span><span class="mi">06</span>
</code></pre></div></div>

<p>“Hit” and “read” refer to reads from the SLRU that where the desired pages were already in the SLRU or they were not.</p>

<p>When new pages are allocated, we see this reflected in “blks_zeroed” as they’re written out with zeroes.</p>

<p>When new pages are written (blks_written) into the SLRU this creates “dirtied” pages that eventually will be written out (flushes).</p>

<p>SLRUs can also be truncated (“Truncates” count).</p>

<p>Some of the source code for SLRUs in Postgres is in the file <code class="language-plaintext highlighter-rouge">backend/access/transam/slru.c</code>.
<a href="https://github.com/postgres/postgres/blob/master/src/backend/access/transam/slru.c">https://github.com/postgres/postgres/blob/master/src/backend/access/transam/slru.c</a></p>

<p>Now that we know some basics about SLRUs and a specific type, the MultiXact SLRU, what are some operational concerns or things that can go wrong?</p>

<h2 id="what-can-go-wrong-with-slrus-and-xacts">What can go wrong with SLRUs and Xacts?</h2>
<p>Operational problems can stem from the fact that SLRUs use a 32-bit number and for high scale Postgres, it’s possible to consume these fast enough that the number can “wrap around.”</p>

<p>Two examples with public write-ups related to SLRU operational problems are:</p>

<ul>
  <li>Subtransactions overflow: Using subtransactions, each use of a subtransaction creates an id to track. At a high enough creation rate it’s possible to run out of values.
This was written up in the GitLab post: <a href="https://about.gitlab.com/blog/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/">Why we spent the last month eliminating PostgreSQL subtransactions</a>.</li>
</ul>

<p>MultiXact member space exhaustion: MultiXact or multiple transactions can occur in a few scenarios.</p>
<ul>
  <li>An explicit row lock: <code class="language-plaintext highlighter-rouge">SELECT … FOR SHARE</code></li>
  <li><code class="language-plaintext highlighter-rouge">SELECT … FOR UPDATE</code></li>
</ul>

<p>Written up in the Metronome blog post: <a href="https://metronome.com/blog/root-cause-analysis-postgresql-multixact-member-exhaustion-incidents-may-2025">Root Cause Analysis: PostgreSQL MultiXact member exhaustion incidents (May 2025)</a>.</p>

<p>A scenario for that could be a foreign key constraint lookup on a high insert table referencing a low cardinality table.</p>

<p>Another type of problem in the buttondown post<sup id="fnref:buttondown:1" role="doc-noteref"><a href="#fn:buttondown" class="footnote" rel="footnote">2</a></sup> is the quadratic growth of MultiXacts.</p>

<p>Dilip Kumar talked about: “Long running transaction, system can go fully to cache replacement, TPS drops, with subtransactions ids (need to get parent ids).” See Dilip’s presentation for more info.<sup id="fnref:dilip" role="doc-noteref"><a href="#fn:dilip" class="footnote" rel="footnote">3</a></sup></p>

<h2 id="what-do-we-do-with-info-as-postgres-operators">What do we do with info as Postgres operators?</h2>
<p>This is a huge topic and this post just scratches the surface.</p>

<p>However, let’s wrap this up here a bit with some takeaways.</p>

<p>If operating a high scale Postgres instance when it comes to SLRUs, what’s worth knowing about?</p>

<ul>
  <li>Know about the SLRU system in general, how to monitor it, and don’t forget about extensions</li>
  <li>Learn about SLRUs limitations and possible failure points, for the various types</li>
  <li>Determine whether your workload is using SLRUs, monitor their growth, and learn about the possible failure points based on your use</li>
</ul>

<h2 id="whats-changing-with-slrus-in-new-postgres-versions">What’s changing with SLRUs in new Postgres versions?</h2>
<p>In Postgres 17, the MultiXact member space and offset is now configurable beyond the initial default size. The unit is the number of 8KB pages. The default size is X and Y and this is configurable.</p>

<ul>
  <li>multixact_member_buffers, default is 32 8kb pages</li>
  <li>multixact_offset_buffers, default is 16 8kb pages</li>
</ul>

<blockquote>
  <p>In the recent episode of postgres.fm <em>MultiXact member space exhaustion</em>,<sup id="fnref:pgfm" role="doc-noteref"><a href="#fn:pgfm" class="footnote" rel="footnote">4</a></sup> the Metronome engineers discussed working on a patch related to MultiXact member exhaustion.</p>
</blockquote>

<p>Lukas covers changes in Postgres 17 to adjust SLRU cache sizes. Each of the SLRU types can now be configured to be larger in size.
<a href="https://pganalyze.com/blog/5mins-postgres-17-configurable-slru-cache">https://pganalyze.com/blog/5mins-postgres-17-configurable-slru-cache</a></p>

<h2 id="conclusion">Conclusion</h2>
<p>I’m still learning about MultiXacts, SLRUs, and failure modes as a result of these. If you have feedback on this post or additional useful resources, I’d love to hear about them. Please contact me here or on social media.</p>

<p>Thanks for reading!</p>

<h2 id="resources">Resources</h2>
<p>Dilip Kumar presentation 2024 - PostgreSQL Development Conference <a href="https://www.youtube.com/watch?v=74xAqgS2thY">https://www.youtube.com/watch?v=74xAqgS2thY</a></p>

<p>MultiXacts Dan Slimmon
<a href="https://blog.danslimmon.com/2023/12/11/concurrent-locks-and-multixacts-in-postgres/">https://blog.danslimmon.com/2023/12/11/concurrent-locks-and-multixacts-in-postgres/</a></p>

<p>5 minutes of Postgres LWLock Lock Manager
<a href="https://pganalyze.com/blog/5mins-postgres-LWLock-lock-manager-contention">https://pganalyze.com/blog/5mins-postgres-LWLock-lock-manager-contention</a></p>

<p>SLRU Improvements Proposals Wiki
<a href="https://wiki.postgresql.org/wiki/SLRU_improvements">https://wiki.postgresql.org/wiki/SLRU_improvements</a></p>

<h2 id="corrections">Corrections</h2>

<p>September 27, 2025: An earlier version of this post inaccurately described SLRU buffers as not being WAL logged. Thank you to Laurenz Albe for writing in to correct this and providing a pointer into the source code to learn more.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:alvaro" role="doc-endnote">
      <p><a href="https://p2d2.cz/files/p2d2-2025-herrera-slru.pdf">https://p2d2.cz/files/p2d2-2025-herrera-slru.pdf</a> <a href="#fnref:alvaro" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:buttondown" role="doc-endnote">
      <p><a href="https://buttondown.com/nelhage/archive/notes-on-some-postgresql-implementation-details/">https://buttondown.com/nelhage/archive/notes-on-some-postgresql-implementation-details/</a> <a href="#fnref:buttondown" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:buttondown:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:dilip" role="doc-endnote">
      <p><a href="https://www.youtube.com/watch?v=74xAqgS2thY">https://www.youtube.com/watch?v=74xAqgS2thY</a> <a href="#fnref:dilip" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:pgfm" role="doc-endnote">
      <p><a href="https://postgres.fm/episodes/multixact-member-space-exhaustion">https://postgres.fm/episodes/multixact-member-space-exhaustion</a> <a href="#fnref:pgfm" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Andrew Atkinson</name></author><category term="PostgreSQL" /><category term="Databases" /><summary type="html"><![CDATA[In this post we’ll cover two types of Postgres internals.]]></summary></entry><entry><title type="html">Avoid UUID Version 4 Primary Keys (for Postgres)</title><link href="https://andyatkinson.com/avoid-uuid-version-4-primary-keys" rel="alternate" type="text/html" title="Avoid UUID Version 4 Primary Keys (for Postgres)" /><published>2025-07-02T00:00:00+00:00</published><updated>2025-07-02T00:00:00+00:00</updated><id>https://andyatkinson.com/avoid-uuid-v4-primary-keys</id><content type="html" xml:base="https://andyatkinson.com/avoid-uuid-version-4-primary-keys"><![CDATA[<h2 id="introduction">Introduction</h2>
<p>Over the last decade, when working on databases with UUID Version 4<sup id="fnref:rfc" role="doc-noteref"><a href="#fn:rfc" class="footnote" rel="footnote">1</a></sup> as the primary key data type, these databases have usually had bad performance and excessive IO.</p>

<p>UUID is a native data type in Postgres stored as binary data. Various UUID versions are in the RFC. Version 4 has mostly random bits, obfuscating information like when the value was created or where it was generated.</p>

<p>Version 4 UUIDs are easy to generate in Postgres using the <code class="language-plaintext highlighter-rouge">gen_random_uuid()</code><sup id="fnref:gen" role="doc-noteref"><a href="#fn:gen" class="footnote" rel="footnote">2</a></sup> function since version 13 (released in 2020).</p>

<p>I’ve learned there are misconceptions about UUID Version 4, and sometimes these are the reasons users pick this data type.</p>

<p>Because of the poor performance, misconceptions, and available alternatives, I’ve come around to a simple position: <em>Avoid UUID Version 4 for primary keys</em>.</p>

<p>My more controversial take is to avoid UUIDs in general, but I understand there are some legitimate reasons for them without practical alternatives.</p>

<p>As a database enthusiast, I wanted to have an articulated position on this classic “Integer v. UUID” debate.</p>

<p>Among databases folks, debating this may be tired and clichéd. However, from my consulting work, I can say I work with databases using UUID v4 in 2024 and 2025, and still see the issues discussed in this post.</p>

<p>Let’s dig in.</p>

<h2 id="uuid-context-for-this-post">UUID context for this post</h2>
<ul>
  <li>UUIDs (or GUID in Microsoft speak)<sup id="fnref:ms" role="doc-noteref"><a href="#fn:ms" class="footnote" rel="footnote">3</a></sup>) are long strings of 36 characters, 32 digits, 4 hyphens, stored as 128 bits (16 byte) values, stored using the binary <code class="language-plaintext highlighter-rouge">uuid</code> data type in Postgres</li>
  <li>The RFC documents how the 128 bits are set</li>
  <li>The bits for UUID Version 4 are mostly random values</li>
  <li>UUID Version 7 includes a timestamp in the first 48 bits, which works much better with database indexes compared with random values</li>
</ul>

<p>Although unreleased as of this writing, and pulled from Postgres 17 previously, UUID V7 is part of Postgres 18<sup id="fnref:land" role="doc-noteref"><a href="#fn:land" class="footnote" rel="footnote">4</a></sup> scheduled for release in the Fall of 2025.</p>

<p>What kind of app databases are in scope for this post?</p>

<h2 id="scope-of-web-app-usage-and-their-scale">Scope of web app usage and their scale</h2>
<p>The kinds of web applications I’m thinking of with this post are monolithic web apps, with Postgres as their primary OLTP database. The apps could be in categories like social media, e-commerce, click tracking, or business process automation apps.</p>

<p>The types of performance issues discussed here are related to inefficient storage and retrieval, meaning they happen for all of these types of apps.</p>

<p>What’s the core issue with UUID v4?</p>

<h2 id="randomness-is-the-issue">Randomness is the issue</h2>
<p>The core issue with UUID Version 4, given that the 122 bits they’re made up of are “random or pseudo-randomly generated values”<sup id="fnref:rfc:1" role="doc-noteref"><a href="#fn:rfc" class="footnote" rel="footnote">1</a></sup>, is how the values are maintained in indexes. Since primary keys are backed by indexes by default, each insert is less efficient compared with inserts for sequentially ordered values.</p>

<p>For lookups, each update and delete for individual items or for ranges of items are less efficient, due to increased traversal of non-sequential index pages in Postgres.</p>

<p>Since the randomly generated values aren’t inserted sequentially (or in sequential/adjacent pages), it’s less efficient to find them later for updates or deletes. Each of these workload types use the primary key index.</p>

<p>UUID v4s don’t have a useful natural ordering that aligns with how they’re stored, and thus both storage and retrieval is less efficient.</p>

<p>Later in the post we’ll look at just how many more Postgres pages need to be accessed for equivalent data, and what that means in terms of performance.</p>

<p>Despite the inefficiencies, UUID v4s and UUIDs in general remain (or at least were) popular in the last decade based on my experience consulting in Postgres.</p>

<p>Given the popularity, what use cases for UUID are there?</p>

<h2 id="why-choose-uuids-at-all-generating-values-from-one-or-more-client-applications">Why choose UUIDs at all? Generating values from one or more client applications</h2>
<p>One use case for UUIDs is when there’s a need to generate an identifier on a client or from multiple services, then passed to Postgres for persistence.</p>

<p>For web apps, generally they instantiate objects in memory and don’t expect an identifier to be used for lookups until after an instance is persisted as a row (where the database generates the identifier).</p>

<p>In a microservices architecture where the apps have their own databases, the ability to generate identifiers from each database without collisions is a use case for UUIDs. The UUID could also identify the database a value came from later, vs. an integer.</p>

<p>For collision avoidance (see HN discussion<sup id="fnref:hn" role="doc-noteref"><a href="#fn:hn" class="footnote" rel="footnote">5</a></sup>), we can’t practically make the same guarantee with sequence-backed integers. There are hacks, like generating even and odd integers between two instances, or using different ranges in the int8 range.</p>

<p>There are also alternative identifiers like using composite primary keys (CPKs), however the same set of 2 values wouldn’t uniquely identify a particular table.</p>

<p>The avoidance of collisions is described this way on Wikipedia:<sup id="fnref:wiki" role="doc-noteref"><a href="#fn:wiki" class="footnote" rel="footnote">6</a></sup></p>

<blockquote>
  <p>The number of random version-4 UUIDs which need to be generated in order to have a 50% probability of one collision: 2.71 quintillion</p>
</blockquote>

<p>This number would be equivalent to:</p>

<blockquote>
  <p>Generating 1 billion UUIDs per second for about 86 years.</p>
</blockquote>

<p>Are UUIDs secure?</p>

<h2 id="misconceptions-uuids-are-secure">Misconceptions: UUIDs are secure</h2>
<p>One misconception about UUIDs is that they’re secure. However, the RFC describes that they shouldn’t be considered secure “capabilities.”</p>

<p>From RFC 4122<sup id="fnref:rfc:2" role="doc-noteref"><a href="#fn:rfc" class="footnote" rel="footnote">1</a></sup> Section 6 Security Considerations:</p>
<blockquote>
  <p>Do not assume that UUIDs are hard to guess; they should not be used
  as security capabilities</p>
</blockquote>

<p>How can we create obfuscated codes from integers?</p>

<h2 id="creating-obfuscated-values-using-integers">Creating obfuscated values using integers</h2>
<p>While UUID V4s obfuscate their creation time, the values can’t be ordered to see when they were created relative to each other. We can  achieve those properties with integers with a little more work.</p>

<p>One option is to generate a pseudo-random code from an integer, then use that value externally, while still using integers internally.</p>

<p>To see the full details of this solution, please check out: <em>Short alphanumeric pseudo random identifiers in Postgres</em><sup id="fnref:alpha" role="doc-noteref"><a href="#fn:alpha" class="footnote" rel="footnote">7</a></sup></p>

<p>We’ll summarize it here.</p>

<ul>
  <li>Convert a decimal integer like “2” into binary bits. E.g. a 4 byte, 32 bit integer: 00000000 00000000 00000000 00000010</li>
  <li>Perform an exclusive OR (XOR) operation on all the bits using a key</li>
  <li>Encode each bit using a base62 alphabet</li>
</ul>

<p>The obfuscated id is stored in a generated column. By reviewing the generated values, they are similar, but aren’t ordered by their creation order.</p>

<p>The values in insertion order were <code class="language-plaintext highlighter-rouge">01Y9I</code>, <code class="language-plaintext highlighter-rouge">01Y9L</code>, then <code class="language-plaintext highlighter-rouge">01Y9K</code>.</p>

<p>With alphabetical order, the last two would be flipped: <code class="language-plaintext highlighter-rouge">01Y9I</code> first, then <code class="language-plaintext highlighter-rouge">01Y9K</code> second, then <code class="language-plaintext highlighter-rouge">01Y9L</code> third, sorting on the fifth character.</p>

<p>If I wanted to use this approach for all tables, I’d try a centralized table that was polymorphic, storing a record for each table that’s using a code (and a foreign key constraint).</p>

<p>That way I’d know where the code was used.</p>

<p>Why else might we want to skip UUIDs?</p>

<h2 id="reasons-against-uuids-in-general-they-consume-a-lot-of-space">Reasons against UUIDs in general: they consume a lot of space</h2>
<p>UUIDs are 16 bytes (128 bits) per value, which is double the space of bigint (8 bytes), or quadruple the space of 4-byte integers. This extra space adds up once many tables have millions of rows, and copies of a database are being moved around as backups and restores.</p>

<p>A more considerable impact to performance though is the poor characteristics of writing and reading random data into indexes.</p>

<h2 id="reasons-against-uuid-v4s-add-insert-latency-due-to-index-page-splits-fragmentation">Reasons against: UUID v4s add insert latency due to index page splits, fragmentation</h2>
<p>For random UUID v4s, Postgres incurs more latency for every insert operation.</p>

<p>For integer primary key rows, their values are maintained in index pages with “append-mostly” operations on “leaf nodes,” since their values are orderable, and since B-Tree indexes store entries in sorted order.</p>

<p>For UUID v4s, primary key values in B-Tree indexes are problematic.</p>

<p>Inserts are not appended to the right most leaf page. They are placed into a random page, and that could be mid-page or an already-full page, causing a page split that would have been unnecessary with an integer.</p>

<p>Planet Scale has a nice visualization of index page splits and rebalancing.<sup id="fnref:ps" role="doc-noteref"><a href="#fn:ps" class="footnote" rel="footnote">8</a></sup></p>

<p>Unnecessary splits and rebalancing add space consumption and processing latency to write operations. This extra IO shows up in Write Ahead Log (WAL) generation as well.</p>

<p><a href="https://buildkite.com/resources/blog/goodbye-integers-hello-uuids/">Buildkite reported a 50% reduction in write IO</a> for the WAL by moving to time-ordered UUIDs.</p>

<p>Given fixed size pages, we want high density within the pages. Later on we’ll use <em>pageinspect</em> to check the average leaf density between integer and UUID to help compare the two.</p>

<h2 id="excessive-io-for-lookups-even-with-orderable-uuids">Excessive IO for lookups even with orderable UUIDs</h2>
<p>B-Tree page layout means you can fit fewer UUIDs per 8KB page. Since we have the limitation of fixed page sizes, we at least want them to be as densely packed as possible.</p>

<p>Since UUID indexes are ~40% larger in leaf pages than bigint (int8) for the same logical number of rows, they can’t be as densely packed with values. As Lukas says, “<em>All in all, the physical data structure matters as much as your server configuration to achieve the best I/O performance in Postgres</em>.”<sup id="fnref:pga5" role="doc-noteref"><a href="#fn:pga5" class="footnote" rel="footnote">9</a></sup></p>

<p>This means that for individual lookups, range scans, or UPDATES, we will incur ~40% more I/O on UUID indexes, as more pages are scanned. Remember that even to access one row, in Postgres the whole page is accessed where the row is, and copied into a shared memory buffer.</p>

<p>Let’s insert and query data and take a look at numbers between these data types.</p>

<h2 id="working-with-integers-uuid-v4-and-uuid-v7">Working with integers, UUID v4, and UUID v7</h2>
<p>Let’s create integer, UUID v4, and UUID v7 fields, index them, load them into the buffer cache with <em>pg_prewarm</em>.</p>

<p>I will use the schema examples from the Cybertec post <a href="https://www.cybertec-postgresql.com/en/unexpected-downsides-of-uuid-keys-in-postgresql/">Unexpected downsides of UUID keys in PostgreSQL</a> by Ants Aasma.</p>

<p>View <a href="https://github.com/andyatkinson/pg_scripts/pull/20">andyatkinson/pg_scripts PR #20</a>.</p>

<p>On my Mac, I compiled the <code class="language-plaintext highlighter-rouge">pg_uuidv7</code> extension. Once compiled and enabled for Postgres, I could use the extension functions to generate UUID V7 values.</p>

<p>Another extension <code class="language-plaintext highlighter-rouge">pg_prewarm</code> is used. It’s a module included with Postgres, so it just needs to be enabled per database where it’s used.</p>

<p>The difference in latency and the enormous difference in buffers from the post was reproducible in my testing.</p>

<blockquote>
  <p>“Holy behemoth buffer count batman”
<small>- Ants Aasma</small></p>
</blockquote>

<p>Cybertec post results:</p>
<ul>
  <li>27,332 buffer hits, index only scan on the <code class="language-plaintext highlighter-rouge">bigint</code> column</li>
  <li>8,562,960 buffer hits, index only scan on the UUID V4 index scan</li>
</ul>

<p>Since these are buffer <em>hits</em> we’re accessing them from memory, which is faster than disk. We can focus then on only the difference in latency based on the data types.</p>

<p>How many more pages are accessed for the UUID index? 8,535,628 (8.5 million!) more 8KB pages were accessed, a 31229.4% increase. In terms of MB and MB/s that is:</p>
<ul>
  <li>68,285,024 MB or ~68.3 GB! more data that’s accessed</li>
</ul>

<p>Calculating a low and high estimate of access speeds for memory:</p>
<ul>
  <li>Low estimate: 20 GB/s</li>
  <li>High estimate: 80 GB/s</li>
</ul>

<p>Accessing 68.3 GB of data from memory (<code class="language-plaintext highlighter-rouge">shared_buffers</code> in PostgreSQL) would add:</p>
<ul>
  <li>~3.4 seconds of latency (low speed)</li>
  <li>~0.86 seconds of latency (high speed)</li>
</ul>

<p>That’s between ~1 and ~3.4 seconds of additional latency solely based on the data type. Here we used 10 million rows and performed 1 million updates, but the latencies will get worse as data and query volumes increase.</p>

<h2 id="inspecting-density-with-the-pageinspect-extension">Inspecting density with the pageinspect extension</h2>
<p>We can inspect the average fill percentage (density) of leaf pages using the <em>pageinspect</em> extension.</p>

<p>The <code class="language-plaintext highlighter-rouge">uuid_experiments/page_density.sql</code> (<a href="https://github.com/andyatkinson/pg_scripts/pull/20">andyatkinson/pg_scripts PR #20</a>) query in the repo gets the indexes for the integer and v4 and v7 uuid columns, their total page counts, their page stats, and the number of leaf pages.</p>

<p>Using the leaf pages, the query calculates an average fill percentage.</p>

<p>After performing the 1 million updates on the 10 million rows mentioned in the example, I got these results from that query:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">idxname</span>             <span class="o">|</span> <span class="n">avg_leaf_fill_percent</span>
<span class="c1">---------------------+-----------------------</span>
 <span class="n">records_id_idx</span>      <span class="o">|</span>                 <span class="mi">97</span><span class="p">.</span><span class="mi">64</span>
 <span class="n">records_uuid_v4_idx</span> <span class="o">|</span>                 <span class="mi">79</span><span class="p">.</span><span class="mi">06</span>
 <span class="n">records_uuid_v7_idx</span> <span class="o">|</span>                 <span class="mi">90</span><span class="p">.</span><span class="mi">09</span>
<span class="p">(</span><span class="mi">3</span> <span class="k">rows</span><span class="p">)</span>
</code></pre></div></div>

<p>This shows the <code class="language-plaintext highlighter-rouge">integer</code> index had an average fill percentage of nearly 98%, while the UUID v4 index was around 79%.</p>

<h2 id="uuid-downsides-worse-cache-hit-ratio">UUID Downsides: Worse cache hit ratio</h2>
<p>The Postgres buffer cache is a critical part of good performance.</p>

<p>For good performance, we want our queries to produce cache “hits” as much as possible.</p>

<p>The buffer cache has limited space. Usually 25-40% of system memory is allocated to it, and the total database size including table and index data is usually much larger than that amount of memory. That means we’ll have trade-offs, as all data will not fit into system memory. This is where the challenges come in!</p>

<p>When pages are accessed they’re copied into the buffer cache as buffers. When write operations happen, buffers are dirtied before being flushed.<sup id="fnref:string" role="doc-noteref"><a href="#fn:string" class="footnote" rel="footnote">10</a></sup></p>

<p>Since the UUIDs are randomly located, additional buffers will need to be copied to the cache compared to ordered integers. Buffers might be evicted to make space that are needed, decreasing hit rates.</p>

<h2 id="mitigations-rebuilding-indexes-with-uuid-values">Mitigations: Rebuilding indexes with UUID values</h2>
<p>Since the tables and indexes are more likely to be fragmented, it makes sense to rebuild the tables and indexes periodically.</p>

<p>Rebuilding tables can be done using pg_repack, pg_squeeze, or <code class="language-plaintext highlighter-rouge">VACUUM FULL</code> if you can afford to perform the operation offline.</p>

<p>Indexes can be rebuilt online using <code class="language-plaintext highlighter-rouge">REINDEX CONCURRENTLY</code>.</p>

<p>While the newly laid out data in pages, they will still not have correlation, and thus not be smaller. The space formerly occupied by deletes will be reclaimed for reuse though.</p>

<h2 id="mitigation-shared-buffers-and-work_mem-memory-sizing">Mitigation: Shared buffers and work_mem memory sizing</h2>
<p>If possible, size your primary instance to have 4x the amount of memory of your size of database. In order words if your database is 25GB, try and run a 128GB memory instance.</p>

<p>This gives around 32GB to 50GB of memory for buffer cache (<code class="language-plaintext highlighter-rouge">shared_buffers</code>) which is hopefully enough to store all accessed pages and index entries.</p>

<p>Use <em>pg_buffercache</em><sup id="fnref:pgbc" role="doc-noteref"><a href="#fn:pgbc" class="footnote" rel="footnote">11</a></sup> to inspect the contents, and <em>pg_prewarm</em><sup id="fnref:pgpre" role="doc-noteref"><a href="#fn:pgpre" class="footnote" rel="footnote">12</a></sup> to populate tables into it.</p>

<p>One tactic I’ve used when working with UUID v4 random values where sorting is happening, is to provide more memory to sort operations.</p>

<p>To do that in Postgres, we can change the <code class="language-plaintext highlighter-rouge">work_mem</code> setting. This setting can be changed for the whole database, a session, or even for individual queries.</p>

<p>Check out <a href="https://www.pgmustard.com/blog/work-mem">Configuring work_mem in Postgres</a> on PgMustard for an example of setting this in a session.</p>

<h2 id="mitigation-in-rails-uuid-and-implicit-order-column-active-record">Mitigation in Rails: UUID and implicit order column Active Record</h2>
<p>Since Rails 6, we can control implicit_order_column.<sup id="fnref:bb" role="doc-noteref"><a href="#fn:bb" class="footnote" rel="footnote">13</a></sup> The <a href="https://github.com/djezzzl/database_consistency/issues/197">database_consistency gem even has a checker</a> for folks using UUID primary keys.</p>

<p>When ORDER BY is generated in queries implicitly, it may be worth ordering on a different high cardinality field that’s indexed, like a <code class="language-plaintext highlighter-rouge">created_at</code> timestamp field.</p>

<h2 id="mitigating-poor-performance-by-clustering-on-orderable-field">Mitigating poor performance by clustering on orderable field</h2>
<p>Cluster on a column that’s high cardinality and indexed could be a mitigation option.</p>

<p>For example, imagine your UUID primary table has a <code class="language-plaintext highlighter-rouge">created_at</code> timestamp column that’s indexed with <code class="language-plaintext highlighter-rouge">idx_on_tbl_created_at</code>, and clustering on that.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CLUSTER</span> <span class="n">table_with_uuid_ok</span> <span class="k">USING</span> <span class="n">idx_on_tbl_created_at</span><span class="p">;</span>
</code></pre></div></div>
<p>I don’t see CLUSTER used ever really though as it takes an <a href="https://pglocks.org/?pgcommand=CLUSTER">access exclusive</a> lock. The CLUSTER is a one-time operation that would also need to be repeated regularly to maintain its benefits.</p>

<h2 id="recommendation-stick-with-sequences-integers-and-big-integers">Recommendation: Stick with sequences, integers, and big integers</h2>
<p>For new databases that <em>may</em> be small, with unknown growth, I recommend plain old integers and an identity column (backed by a sequence)<sup id="fnref:seq" role="doc-noteref"><a href="#fn:seq" class="footnote" rel="footnote">14</a></sup> for primary keys. These are signed 32 bit (4-byte) values. This provides about 2 billion positive unique values per table.</p>

<p>For many business apps, they will never reach 2 billion unique values per table, so this will be adequate for their entire life. I’ve also recommended always using bigint/int8 in other contexts.</p>

<p>I guess it comes down to what you know about your data size, how you can project growth. There are plenty of low growth business apps out there, in constrained industries, and constrained sets of business users.</p>

<p>For Internet-facing consumer apps with expected high growth, like social media, click tracking, sensor data, telemetry collection types of apps, or when migrating an existing medium or large database with 100s of millions or billions of rows, then it makes sense to start with <code class="language-plaintext highlighter-rouge">bigint</code> (int8), 64-bit, 8-byte integer primary keys.</p>

<h2 id="uuid-v4-alternatives-use-time-ordered-uuids-like-version-7">UUID v4 alternatives: Use time-ordered UUIDs like Version 7</h2>
<p>Since Postgres 18 is not yet released, generating UUID V7s now in Postgres is possible using the <code class="language-plaintext highlighter-rouge">pg_uuidv7</code> extension.</p>

<p>If you have an existing UUID v4 filled database and can’t afford a costly migration to another primary key data type, then starting to populate new values using UUID v7 will help somewhat.</p>

<p>Fortunately the binary <code class="language-plaintext highlighter-rouge">uuid</code> data type in Postgres can be used whether you’re storing V4 or V7 UUID values.</p>

<p>Another alternative that relies on an extension is <em>sequential_uuids</em>.<sup id="fnref:sequ" role="doc-noteref"><a href="#fn:sequ" class="footnote" rel="footnote">15</a></sup></p>

<h2 id="summary">Summary</h2>
<ul>
  <li>UUID v4s increase latency for lookups, as they can’t take advantage of fast ordered lookups in B-Tree indexes</li>
  <li>For new databases, don’t use <code class="language-plaintext highlighter-rouge">gen_random_uuid()</code> for primary key types, which generates random UUID v4 values</li>
  <li>UUIDs consume twice the space of <code class="language-plaintext highlighter-rouge">bigint</code></li>
  <li>UUID v4 values are not meant to be secure per the UUID RFC</li>
  <li>UUID v4s are random. For good performance, the whole index must be in buffer cache for index scans, which is increasingly unlikely for bigger data.</li>
  <li>UUID v4s cause more page splits, which increase IO for writes with increased fragmentation, and increased size of WAL logs</li>
  <li>For non-guessable, obfuscated pseudo-random codes, we can generate those from integers, which could be an alternative to using UUIDs</li>
  <li>If you must use UUIDs, use time-orderable UUIDs like UUID v7</li>
</ul>

<p>Do you see any errors or have any suggested improvements? Please <a href="/contact">contact me</a>. Thanks for reading!</p>

<h2 id="learn-more">Learn More</h2>
<ul>
  <li>Franck Pachot for AWS Heroes has an interesting take on <a href="https://dev.to/aws-heroes/uuid-in-postgresql-3n53">UUID in PostgreSQL</a></li>
  <li>Brandur has a great post: <a href="https://brandur.org/nanoglyphs/026-ids">Identity Crisis: Sequence v. UUID as Primary Key</a></li>
  <li>5mins of Postgres: <a href="https://pganalyze.com/blog/5mins-postgres-uuid-vs-serial-primary-keys">UUIDs vs Serial for Primary Keys - what’s the right choice?</a></li>
  <li><a href="https://github.com/andyatkinson/pg_scripts/pull/20">andyatkinson/pg_scripts PR #20</a></li>
</ul>

<h2 id="updates">Updates</h2>
<ul>
  <li>2025-12-15: Appeared on <a href="https://news.ycombinator.com/item?id=46272487">front page of Hacker News</a>. Updating the “Randomness is the issue” section for improved clarity.</li>
</ul>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:rfc" role="doc-endnote">
      <p><a href="https://datatracker.ietf.org/doc/html/rfc4122#section-4.4">https://datatracker.ietf.org/doc/html/rfc4122#section-4.4</a> <a href="#fnref:rfc" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:rfc:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:rfc:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:gen" role="doc-endnote">
      <p><a href="https://www.postgresql.org/docs/current/functions-uuid.html">https://www.postgresql.org/docs/current/functions-uuid.html</a> <a href="#fnref:gen" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:ms" role="doc-endnote">
      <p><a href="https://stackoverflow.com/a/6953207/126688">https://stackoverflow.com/a/6953207/126688</a> <a href="#fnref:ms" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:land" role="doc-endnote">
      <p><a href="https://www.thenile.dev/blog/uuidv7">https://www.thenile.dev/blog/uuidv7</a> <a href="#fnref:land" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:hn" role="doc-endnote">
      <p><a href="https://news.ycombinator.com/item?id=36429986">https://news.ycombinator.com/item?id=36429986</a> <a href="#fnref:hn" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:wiki" role="doc-endnote">
      <p><a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">https://en.wikipedia.org/wiki/Universally_unique_identifier</a> <a href="#fnref:wiki" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:alpha" role="doc-endnote">
      <p><a href="https://andyatkinson.com/generating-short-alphanumeric-public-id-postgres">https://andyatkinson.com/generating-short-alphanumeric-public-id-postgres</a> <a href="#fnref:alpha" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:ps" role="doc-endnote">
      <p><a href="https://planetscale.com/blog/the-problem-with-using-a-uuid-primary-key-in-mysql">https://planetscale.com/blog/the-problem-with-using-a-uuid-primary-key-in-mysql</a> <a href="#fnref:ps" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:pga5" role="doc-endnote">
      <p><a href="https://pganalyze.com/blog/5mins-postgres-io-basics">https://pganalyze.com/blog/5mins-postgres-io-basics</a> <a href="#fnref:pga5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:string" role="doc-endnote">
      <p><a href="https://stringintech.github.io/blog/p/postgresql-buffer-cache-a-practical-guide/">https://stringintech.github.io/blog/p/postgresql-buffer-cache-a-practical-guide/</a> <a href="#fnref:string" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:pgbc" role="doc-endnote">
      <p><a href="https://www.postgresql.org/docs/current/pgbuffercache.html">https://www.postgresql.org/docs/current/pgbuffercache.html</a> <a href="#fnref:pgbc" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:pgpre" role="doc-endnote">
      <p><a href="https://www.postgresql.org/docs/current/pgprewarm.html">https://www.postgresql.org/docs/current/pgprewarm.html</a> <a href="#fnref:pgpre" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:bb" role="doc-endnote">
      <p><a href="https://www.bigbinary.com/blog/rails-6-adds-implicit_order_column">https://www.bigbinary.com/blog/rails-6-adds-implicit_order_column</a> <a href="#fnref:bb" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:seq" role="doc-endnote">
      <p><a href="https://www.cybertec-postgresql.com/en/uuid-serial-or-identity-columns-for-postgresql-auto-generated-primary-keys/">https://www.cybertec-postgresql.com/en/uuid-serial-or-identity-columns-for-postgresql-auto-generated-primary-keys/</a> <a href="#fnref:seq" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:sequ" role="doc-endnote">
      <p><a href="https://pgxn.org/dist/sequential_uuids">https://pgxn.org/dist/sequential_uuids</a> <a href="#fnref:sequ" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Andrew Atkinson</name></author><category term="PostgreSQL" /><category term="Databases" /><category term="Ruby on Rails" /><summary type="html"><![CDATA[Introduction Over the last decade, when working on databases with UUID Version 41 as the primary key data type, these databases have usually had bad performance and excessive IO. https://datatracker.ietf.org/doc/html/rfc4122#section-4.4 &#8617;]]></summary></entry><entry><title type="html">CORE Database Schema Design: Constraint-driven, Optimized, Responsive, and Efficient</title><link href="https://andyatkinson.com/constraint-driven-optimized-responsive-efficient-core-db-design" rel="alternate" type="text/html" title="CORE Database Schema Design: Constraint-driven, Optimized, Responsive, and Efficient" /><published>2025-06-09T00:00:00+00:00</published><updated>2025-06-09T00:00:00+00:00</updated><id>https://andyatkinson.com/constraint-driven-optimized-responsive-efficient-core-db-design</id><content type="html" xml:base="https://andyatkinson.com/constraint-driven-optimized-responsive-efficient-core-db-design"><![CDATA[<h2 id="introduction">Introduction</h2>
<p>In this post, we’ll cover some database design principles and package them up into a catchy mnemonic acronym.</p>

<p>Software engineering is loaded with acronyms like this. For example, <a href="https://en.wikipedia.org/wiki/SOLID">SOLID principles</a> describe 5 principles, Single responsibility, Open-closed, Liskov substitution, Interface segregation and Dependency inversion, that promote good object-oriented design.</p>

<p>Databases are loaded with acronyms, for example “ACID” for the properties of a transaction, but I wasn’t familiar with one the schema designer could keep in mind while they’re working.</p>

<p>Thus, the motivation for this acronym was to help the schema designer, by packaging up some principles of good design practices for database schema design. It’s not based in research or academia though, so don’t take this too seriously. That said, I’d love your feedback!</p>

<p>Let’s get into it.</p>

<h2 id="picking-a-mnemonic-acronym">Picking a mnemonic acronym</h2>
<p>In picking an acronym, I wanted it to be short and have each letter describe a word that’s useful, practical, and grounded in experience. I preferred a real word for memorability!</p>

<p>The result was “<strong>CORE</strong>.” Let’s explore each letter and the word behind it.</p>

<h2 id="constraint-driven">Constraint-Driven</h2>
<p>The first word (technically two) is “constraint-driven.” Relational databases offer rigid structures, but the ability to be changed while online, a form of flexibility in their evolution. We evolve their structure through <a href="https://en.wikipedia.org/wiki/Data_definition_language">DDL</a>. They use <a href="https://www.postgresql.org/docs/current/datatype.html">data types</a> and <a href="https://www.postgresql.org/docs/current/ddl-constraints.html">constraints</a> to make changes, as entities and relationships evolve.</p>

<p>Constraint-driven refers to leveraging all the constraint objects available, designing for our needs today, but also in a more general sense applying constraints (restrictions) to designs in the pursuit of data consistency and quality.</p>

<p>Let’s look at some examples. Choose the appropriate data types, like a numeric data type and not a character data type when storing a number. Use <code class="language-plaintext highlighter-rouge">NOT NULL</code> for columns by default. Create foreign key constraints for table relationships by default.</p>

<p>Validate expected data inputs using check constraints. For small databases, use <code class="language-plaintext highlighter-rouge">integer</code> primary keys. If tables get huge later, no problem, we can migrate the data into a bigger more suitable structure.</p>

<p>The mindset is to prefer rigidity initially, design for today, then leverage the flexibility available to evolve later, as opposed to designing for a hypothetical future state.</p>

<h2 id="optimized">Optimized</h2>
<p>Databases present loads of optimization opportunities. Relational data is initially stored in a normalized form to eliminate duplication, but later <em>denormalizations</em> can be performed when read access is more important.</p>

<p>When our use cases are not known at the outset, plan to iterate on the design, changing the structure to better support the use cases that emerge. This will mean evolving the schema design.</p>

<p>This applies to tables, columns, constraints, indexes, parameters, queries, and anything that can be optimized to better support real use cases.</p>

<p>Queries are restructured and indexes are added to reduce data access. Strive for highly selective data access (a small proportion of rows) on high cardinality (uniqueness) data to reduce latency.</p>

<p>Critical background processes like <a href="https://www.postgresql.org/docs/current/sql-vacuum.html">VACUUM</a> get optimized too. Resources (workers, memory, parallelization) are increased proportionally.</p>

<h2 id="responsive">Responsive</h2>
<p>When problems emerge like column or row level unexpected data, missing referential integrity, or query performance problems, engineers inspect logs, catalog statistics, and parameters, from the core engine and third party extensions to diagnose issues.</p>

<p>When DDL changes are ready, the engineer applies them in a non-blocking way, in multiple steps as needed. Operations are performed “online” by default when practical.</p>

<p>DDL changes are in a source code file, reviewed, tracked, and a copy of the schema design is kept in sync across environments.</p>

<p>Parameter (GUC) tuning (Postgres: <code class="language-plaintext highlighter-rouge">work_mem</code>, etc.) happens in a trackable way. Parameters are tuned online when possible, and scoped narrowly, to optimize their values for real queries and use cases.</p>

<h2 id="efficient">Efficient</h2>
<p>It’s relatively costly to store data in the database, compared with file storage! The data consumes limited space and accessing data unnecessarily adds latency.</p>

<p>Data that’s stored is queried later or it’s archived.</p>

<p>To minimize space consumption and latency, tables, columns, constraints, and indexes are removed continually by default, when they no longer are required, to reduce system complexity.</p>

<p>Server software is upgraded at least annually so that performance and security benefits can be leveraged.</p>

<p>Huge tables are split into smaller tables using table partitioning for more predictable administration.</p>

<h2 id="core-database-design">CORE Database Design</h2>
<p>There’s lots more to evolving a database schema design, but these principles are a few I keep in mind.</p>

<p>Did you notice anything missing? Do you have other feedback? Please <a href="/contact">contact me</a> with your thoughts.</p>

<h2 id="thank-you">Thank You</h2>
<p>Over the years, I’ve learned a lot from <a href="https://postgres.fm">Postgres.fm</a> hosts <a href="https://postgres.ai">Nikolay</a> and <a href="https://www.pgmustard.com">Michael</a>, and other community leaders like <a href="https://pganalyze.com">Lukas</a> and <a href="https://dev.to/franckpachot">Franck</a>, as they’ve shaped my database design choices.</p>

<p>I’m grateful to them for sharing their knowledge and experience with the community.</p>

<p>Thanks for reading!</p>]]></content><author><name>Andrew Atkinson</name></author><category term="PostgreSQL" /><category term="Databases" /><summary type="html"><![CDATA[Introduction In this post, we’ll cover some database design principles and package them up into a catchy mnemonic acronym.]]></summary></entry><entry><title type="html">Tip: Put your Rails app on a SQL query diet</title><link href="https://andyatkinson.com/tip-track-sql-queries-quantity-ruby-rails-postgresql" rel="alternate" type="text/html" title="Tip: Put your Rails app on a SQL query diet" /><published>2025-05-29T17:29:00+00:00</published><updated>2025-05-29T17:29:00+00:00</updated><id>https://andyatkinson.com/tip-track-quantity-of-queries-postgresql-ruby-rails</id><content type="html" xml:base="https://andyatkinson.com/tip-track-sql-queries-quantity-ruby-rails-postgresql"><![CDATA[<h2 id="introduction">Introduction</h2>
<p>Much of the time taken processing HTTP requests in web apps is SQL queries. To minimize that, we want to avoid unnecessary and duplicate queries, and generally perform as few queries as possible.</p>

<p>Think of the work that needs to happen for <em>every</em> query. The database engine parses it, creates a query execution plan, executes it, and then sends the response to the client.</p>

<p>When the response reaches the client, there’s even more work to do. The response is transformed into application objects in memory.</p>

<p>How do we see how many queries are being created for our app actions?</p>

<h2 id="count-the-queries">Count the queries</h2>
<p>When doing backend work in a web app like Rails, monitor the number of queries being created directly, by the ORM, or by libraries. ORMs like Active Record can generate more than one query from a given line of code. Libraries can generate queries that are problematic and may be unnecessary.</p>

<p>Over time, developers may duplicate queries unknowingly. These are all real causes of unnecessary queries from my work experience.</p>

<p>Why are excessive queries a problem?</p>

<h2 id="why-reduce-the-number-of-queries">Why reduce the number of queries?</h2>
<p>Besides parsing, planning, executing, and serializing the response, the client is subject to a hard upper limit on the number of TCP connections it can send to the database server.</p>

<p>In Postgres that’s configured as <code class="language-plaintext highlighter-rouge">max_connections</code>. The application will have a variable number of open connections based on use, and its configuration of processes, threads and its connection pool. Keeping the query count low helps avoid exceeding the upper limit.</p>

<p>What about memory use?</p>

<h2 id="what-about-app-server-memory">What about app server memory?</h2>
<p>With Ruby on Rails, the cost of repeated queries is shifted because the <a href="https://guides.rubyonrails.org/caching_with_rails.html#sql-caching">SQL Cache</a> is enabled by default, which stores and serves results for matching repeated queries, at the cost of some memory use.</p>

<p>As an side, from <a href="https://www.shakacode.com/blog/rails-make-active-records-query-cache-an-lru">Rails 7.1 the SQL Cache uses a least recently used (LRU) algorithm</a>. We can also configure the max number of queries to cache, 100 by default, to control how much memory is used.</p>

<h2 id="counting-queries-prior-to-rails-72">Counting queries prior to Rails 7.2</h2>
<p>Prior to Rails 7.2, I recommend adding the <a href="https://github.com/rubysamurai/query_count"><strong>query_count</strong></a> gem which does a simple thing, it shows the count of SQL queries processed for an action.</p>

<p>The count is in the Rails log file like this: <code class="language-plaintext highlighter-rouge">SQL Queries: 100 (50 cached)</code>. In this case, 100 queries were performed and 50 used the SQL Cache.</p>

<h2 id="built-in-from-rails-72-onward">Built-in from Rails 7.2 onward</h2>
<p>From Rails 7.2 onward, the count of queries is now built in, so <a href="https://github.com/rubysamurai/query_count/issues/2">query_count is no longer needed</a>.</p>

<p>Rails 7.2 onward looks like this: <code class="language-plaintext highlighter-rouge">ActiveRecord: 105.5ms (10 queries, 1 cached)</code>. Here 10 queries ran, and 1 used the SQL Cache.</p>

<h2 id="repeated-queries">Repeated queries</h2>
<p>While the SQL Cache saves the roundtrip for a repeated query, ideally we want to eliminate the repeated query. It’s worth hunting for it and considering refactoring or restructuring data access.</p>

<p>Another tactic is using memoization to store results for the duration of processing one controller action. Read more about that: <a href="https://www.honeybadger.io/blog/ruby-rails-memoization/">Speeding up Rails with Memoization</a>.</p>

<p>How do I get started?</p>

<h2 id="finding-the-source-code-location-of-the-queries">Finding the source code location of the queries</h2>
<p>To get started, identify some slow API endpoints in production, run them locally in development, and begin monitoring their quantity of SQL queries. Find the <a href="https://andyatkinson.com/source-code-line-numbers-ruby-on-rails-marginalia-query-logs">Source code locations for database queries in Rails with Marginalia and Query Logs</a>.</p>

<p>Determine how to factor out data access that can be shared.</p>

<h2 id="how-many-queries-are-a-lot">How many queries are “a lot?”</h2>
<p>It’s hard to give a generic number. However, duplicate queries are a category to remove.</p>

<p>Let’s say you’ve got a Book model for your bookstore app. Scan your Rails log file for a pattern like this:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Book</span> <span class="k">Load</span> <span class="p">(</span><span class="mi">4</span><span class="p">.</span><span class="mi">3</span><span class="n">ms</span><span class="p">)</span> <span class="err">…</span>
<span class="n">Book</span> <span class="k">Load</span> <span class="p">(</span><span class="mi">5</span><span class="p">.</span><span class="mi">0</span><span class="n">ms</span><span class="p">)</span> <span class="err">…</span>
<span class="n">Book</span> <span class="k">Load</span> <span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">5</span><span class="n">ms</span><span class="p">)</span> <span class="err">…</span>
<span class="n">Book</span> <span class="k">Load</span> <span class="p">(</span><span class="mi">2</span><span class="p">.</span><span class="mi">3</span><span class="n">ms</span><span class="p">)</span> <span class="err">…</span>
</code></pre></div></div>

<p>If you see that sort of pattern, track down the source locations, and eliminate any repeated loads. Let’s assume this is not a <a href="https://guides.rubyonrails.org/active_record_querying.html#n-1-queries-problem">N + 1 queries problem</a>, but repeated access to the same data from different source code locations.</p>

<p>You may be able to factor out and consolidate a data load. You may be able to use an existing loaded collection for an existence check, or use memoization to use previously calculated results.</p>

<p>Using these tactics, I’ve reduced controller actions with 250+ SQL queries (a ton!) to 50 or fewer (still a lot), by going through these steps. Monitor the log, find source locations for first party code, ORM generated queries, query code from libraries (gems), Rails controller action “before filters,” and other sources, then eliminate and consolidate.</p>

<p>When faced with a lot of queries, I find it helpful to study the bare minimum of what’s needed by the client, working outside in, then look to see if it’s possible to reduce the tables, rows, and columns to only what’s needed.</p>

<h2 id="wrap-up">Wrap Up</h2>
<ul>
  <li>Track the count of SQL queries performed in different versions of Rails</li>
  <li>Remove unnecessary queries so they don’t use limited system resources</li>
  <li>Eliminate repeated queries to keep the count as low as possible</li>
  <li>Only access data that’s needed for client application use cases</li>
</ul>]]></content><author><name>Andrew Atkinson</name></author><category term="Ruby on Rails" /><category term="PostgreSQL" /><summary type="html"><![CDATA[Introduction Much of the time taken processing HTTP requests in web apps is SQL queries. To minimize that, we want to avoid unnecessary and duplicate queries, and generally perform as few queries as possible.]]></summary></entry><entry><title type="html">Big Problems From Big IN lists with Ruby on Rails and PostgreSQL</title><link href="https://andyatkinson.com/big-problems-big-in-clauses-postgresql-ruby-on-rails" rel="alternate" type="text/html" title="Big Problems From Big IN lists with Ruby on Rails and PostgreSQL" /><published>2025-05-23T14:30:00+00:00</published><updated>2025-05-23T14:30:00+00:00</updated><id>https://andyatkinson.com/big-problems-big-in-clauses-postgresql-ruby-on-rails-orm</id><content type="html" xml:base="https://andyatkinson.com/big-problems-big-in-clauses-postgresql-ruby-on-rails"><![CDATA[<h2 id="introduction">Introduction</h2>
<p>If you’ve created web apps with relational databases and ORMs like Active Record (part of Ruby on Rails), you’ve probably experienced database performance problems after a certain size of data and query volume.</p>

<p>In this post, we’re going to look at a specific type of problematic query pattern that’s somewhat common.</p>

<p>We’ll refer to this pattern as “Big <code class="language-plaintext highlighter-rouge">IN</code>s,” which are queries with an <code class="language-plaintext highlighter-rouge">IN</code> clause that has a big list of values. As data grows, the length of the list of values will grow. These queries tend to perform poorly for big lists, causing user experience problems or even partial outages.</p>

<p>We’ll dig into the origins of this pattern, why the performance of it is poor, and explore some alternatives that you can use in your projects.</p>

<h2 id="in-clauses-with-a-big-list-of-values">IN clauses with a big list of values</h2>
<p>The technical term for values are a <em>parenthesized list of scalar expressions</em>.</p>

<p>For example in the SQL query below, the <code class="language-plaintext highlighter-rouge">IN</code> clause portion is <code class="language-plaintext highlighter-rouge">WHERE author_id IN (1,2,3)</code> and the list of scalar expressions is <code class="language-plaintext highlighter-rouge">(1,2,3)</code>.</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">books</span>
<span class="k">WHERE</span> <span class="n">author_id</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">);</span>
</code></pre></div></div>

<p>The purpose of this clause is to perform filtering. Looking at a query execution plan in Postgres, we’ll see something like this fragment below:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Filter</span><span class="p">:</span> <span class="p">(</span><span class="n">author_id</span> <span class="o">=</span> <span class="k">ANY</span> <span class="p">(</span><span class="s1">'{1,2,3}'</span><span class="p">::</span><span class="nb">integer</span><span class="p">[]))</span>
</code></pre></div></div>

<p>This of course filters the full set of books down to ones that match on <code class="language-plaintext highlighter-rouge">author_id</code>.</p>

<p>Filtering is a typical database operation. Why are these slow?</p>

<h2 id="parsing-planning-and-executing">Parsing, planning, and executing</h2>
<p>Remember that our queries are parsed, planned, and executed. A big list of values are treated like constants, and don’t have associated statistics.</p>

<p>Queries with big lists of values take more time to parse and use more memory.</p>

<p>Without pre-collected table statistics for planning decisions, PostgreSQL is more likely to mis-estimate cardinality and row selectivity.</p>

<p>This can mean the planner chooses a sequential scan over an index scan, causing a big slowdown.</p>

<p>How do we create this pattern?</p>

<h2 id="creating-this-pattern-directly">Creating this pattern directly</h2>
<p>In Active Record, a developer might create this query pattern by using <code class="language-plaintext highlighter-rouge">pluck(:id)</code> to collect some ids in a list, then pass that list as an argument to another query.</p>

<p>Here’s an example of that:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">author_ids</span> <span class="o">=</span> <span class="n">Author</span><span class="p">.</span>
  <span class="k">where</span><span class="p">(</span><span class="nv">"created_at &gt;= ?"</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="nb">year</span><span class="p">.</span><span class="n">ago</span><span class="p">).</span>
  <span class="n">pluck</span><span class="p">(:</span><span class="n">id</span><span class="p">)</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">author_ids</code> are supplied as the argument querying <code class="language-plaintext highlighter-rouge">books</code> by <code class="language-plaintext highlighter-rouge">author_id</code> foreign key:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Book</span><span class="p">.</span><span class="k">where</span><span class="p">(</span><span class="n">author_id</span><span class="p">:</span> <span class="n">author_ids</span><span class="p">)</span>
</code></pre></div></div>

<p>Another scenario is when this query is created from ORM methods. What does that look like?</p>

<h2 id="active-record-orm-methods-that-create-this-pattern">Active Record ORM methods that create this pattern</h2>
<p>This query pattern can happen when using eager loading methods like <code class="language-plaintext highlighter-rouge">includes()</code> or <code class="language-plaintext highlighter-rouge">preload()</code>.</p>

<p>This <a href="https://www.crunchydata.com/blog/real-world-performance-gains-with-postgres-17-btree-bulk-scans">Crunchy Data post</a> mentions how eager loading methods produce <code class="language-plaintext highlighter-rouge">IN</code> clause SQL queries.</p>

<p>The post links to the <a href="https://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations">Eager Loading Associations documentation</a> which has examples in Active Record and the resulting SQL that we’ll use here.</p>

<p>Let’s first discuss N+1 with these examples.</p>

<h2 id="fixing-n1s">Fixing N+1s</h2>
<p>Let’s study the examples here. Here’s some Active Record for books and authors:</p>
<div class="language-rb highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># N+1</span>
<span class="n">books</span> <span class="o">=</span> <span class="no">Book</span><span class="p">.</span><span class="nf">limit</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>

<span class="n">books</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">book</span><span class="o">|</span>
   <span class="nb">puts</span> <span class="n">book</span><span class="p">.</span><span class="nf">author</span><span class="p">.</span><span class="nf">last_name</span>
<span class="k">end</span>
</code></pre></div></div>

<p>The issue above is the undesirable N+1 query pattern, where a table is repeatedly queried in a loop, instead of bulk loading all of the desired authors.</p>

<p>To fix the N+1, we’ll add the <code class="language-plaintext highlighter-rouge">includes(:author)</code> eager loading method to the code above.</p>

<p>That looks like this:</p>
<pre><code>
books = Book.<strong style="background-color:yellow;">includes(:author)</strong>.limit(10) 👈

books.each do |book|
   puts book.author.last_name
end
</code></pre>

<p>We’ve now eliminated the N+1 queries, but we’ve opened ourselves up to a new possible problem.</p>

<h2 id="eager-loading-with-includes-or-preload">Eager loading with includes or preload</h2>
<p>While the <code class="language-plaintext highlighter-rouge">includes(:author)</code> fixed the N+1 queries, Active Record is now creating two queries, with the second one having an <code class="language-plaintext highlighter-rouge">IN</code> clause.</p>

<p>Here’s the example from above as SQL:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">books</span><span class="p">.</span><span class="o">*</span> <span class="k">FROM</span> <span class="n">books</span> <span class="k">LIMIT</span> <span class="mi">10</span><span class="p">;</span>

<span class="k">SELECT</span> <span class="n">authors</span><span class="p">.</span><span class="o">*</span> <span class="k">FROM</span> <span class="n">authors</span>
  <span class="k">WHERE</span> <span class="n">authors</span><span class="p">.</span><span class="n">id</span> <span class="k">IN</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">,</span><span class="mi">7</span><span class="p">,</span><span class="mi">8</span><span class="p">,</span><span class="mi">9</span><span class="p">,</span><span class="mi">10</span><span class="p">);</span>
</code></pre></div></div>

<p>Here we only have 10 values for the <code class="language-plaintext highlighter-rouge">IN</code> clause, so performance will be fine. However, once we’ve got hundreds or thousands of values, we will run into the problems described above.</p>

<p>Performance will tank if the <code class="language-plaintext highlighter-rouge">authors.id</code> primary key index isn’t used for this filtering operation.</p>

<p>Are there alternatives for eager loading?</p>

<h2 id="eager-loading-using-eager_load">Eager loading using eager_load</h2>
<p>Besides <code class="language-plaintext highlighter-rouge">includes()</code> and <code class="language-plaintext highlighter-rouge">preload()</code> which create two queries with the second having an <code class="language-plaintext highlighter-rouge">IN</code> clause, there’s another way to do eager loading in Active Record.</p>

<p>An alternative method <code class="language-plaintext highlighter-rouge">eager_load</code> works a little bit differently. It produces a single SQL query that uses a <code class="language-plaintext highlighter-rouge">LEFT OUTER JOIN</code>.</p>

<p>Here’s an example of <code class="language-plaintext highlighter-rouge">eager_load</code> from the Active Record documentation:</p>
<div class="language-rb highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">books</span> <span class="o">=</span> <span class="no">Book</span><span class="p">.</span><span class="nf">eager_load</span><span class="p">(</span><span class="ss">:author</span><span class="p">).</span><span class="nf">limit</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>

<span class="n">books</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">book</span><span class="o">|</span>
  <span class="nb">puts</span> <span class="n">book</span><span class="p">.</span><span class="nf">author</span><span class="p">.</span><span class="nf">last_name</span>
<span class="k">end</span>
</code></pre></div></div>

<p>The following single SQL query is produced. Note that it has no <code class="language-plaintext highlighter-rouge">IN</code> clause.</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span>
    <span class="nv">"books"</span><span class="p">.</span><span class="nv">"id"</span> <span class="k">AS</span> <span class="n">t0_r0</span><span class="p">,</span>
    <span class="nv">"books"</span><span class="p">.</span><span class="nv">"title"</span> <span class="k">AS</span> <span class="n">t0_r1</span>
<span class="k">FROM</span>
    <span class="nv">"books"</span> <span class="k">LEFT</span> <span class="k">OUTER</span> <span class="k">JOIN</span> <span class="nv">"authors"</span>
    <span class="k">ON</span> <span class="nv">"authors"</span><span class="p">.</span><span class="nv">"id"</span> <span class="o">=</span> <span class="nv">"books"</span><span class="p">.</span><span class="nv">"author_id"</span>
<span class="k">LIMIT</span> <span class="mi">10</span><span class="p">;</span>
</code></pre></div></div>

<p>Since we’re now using a join operation, we’ve got statistics available from both tables. This makes it much more likely PostgreSQL can correctly estimate selectivity and cardinality.</p>

<p>The planner also isn’t needing to parse and store a large list of constant values.</p>

<p>While <code class="language-plaintext highlighter-rouge">IN</code> clauses might perform fine with smaller inputs, e.g. 100 values or fewer,<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> for large lists we should try and restructure the query to use a join operation instead.</p>

<p>Besides restructuring the queries into joins, are there other alternatives?</p>

<h2 id="alternative-approaches-using-any-or-some">Alternative approaches using ANY or SOME</h2>
<p>Crunchy Data’s post <a href="https://www.crunchydata.com/blog/postgres-query-boost-using-any-instead-of-in">Postgres Query Boost: Using ANY Instead of IN</a> describes how <code class="language-plaintext highlighter-rouge">IN</code> is more restrictive on the input.</p>

<p>A more usable alternative to <code class="language-plaintext highlighter-rouge">IN</code> can be using <code class="language-plaintext highlighter-rouge">ANY</code> or <code class="language-plaintext highlighter-rouge">SOME</code>, which has more flexibility in handling the list of values.</p>

<p>Here’s A CTE example using <code class="language-plaintext highlighter-rouge">ANY</code>:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">WITH</span> <span class="n">author_ids</span> <span class="k">AS</span> <span class="p">(</span>
  <span class="k">SELECT</span> <span class="n">id</span> <span class="k">FROM</span> <span class="n">authors</span>
<span class="p">)</span>
<span class="k">SELECT</span> <span class="n">title</span>
<span class="k">FROM</span> <span class="n">books</span>
<span class="k">WHERE</span> <span class="n">author_id</span> <span class="o">=</span> <span class="k">ANY</span> <span class="p">(</span>
      <span class="k">SELECT</span> <span class="n">id</span>
      <span class="k">FROM</span> <span class="n">author_ids</span><span class="p">);</span>
</code></pre></div></div>

<p>However, <code class="language-plaintext highlighter-rouge">ANY</code> is not generated by Active Record. What if we want to generate these queries using Active Record?</p>

<p>One option is to use the <code class="language-plaintext highlighter-rouge">any</code> method provided by the <a href="https://github.com/GeorgeKaraszi/ActiveRecordExtended">ActiveRecordExtended</a> gem.</p>

<p>Let’s talk at another alternative approach using a <code class="language-plaintext highlighter-rouge">VALUES</code> clause.</p>

<h2 id="a-values-clause">A VALUES clause</h2>
<p>In the comments in the PR above, Vlad and Sean discussed an alternative for <code class="language-plaintext highlighter-rouge">IN</code> using a <code class="language-plaintext highlighter-rouge">VALUES</code> clause.</p>

<p>Let’s look at an example with a CTE and <code class="language-plaintext highlighter-rouge">VALUES</code> clause:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">WITH</span> <span class="n">ids</span><span class="p">(</span><span class="n">author_id</span><span class="p">)</span> <span class="k">AS</span> <span class="p">(</span>
  <span class="k">VALUES</span><span class="p">(</span><span class="mi">1</span><span class="p">),(</span><span class="mi">2</span><span class="p">),(</span><span class="mi">3</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">SELECT</span> <span class="n">title</span>
<span class="k">FROM</span> <span class="n">books</span>
<span class="k">JOIN</span> <span class="n">ids</span> <span class="k">USING</span> <span class="p">(</span><span class="n">author_id</span><span class="p">);</span>
</code></pre></div></div>

<p>Or we can write this as a subquery:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">title</span>
<span class="k">FROM</span> <span class="n">books</span>
<span class="k">WHERE</span> <span class="n">author_id</span> <span class="k">IN</span> <span class="p">(</span>
  <span class="k">SELECT</span> <span class="n">id</span>
  <span class="k">FROM</span> <span class="p">(</span><span class="k">VALUES</span><span class="p">(</span><span class="mi">1</span><span class="p">),(</span><span class="mi">2</span><span class="p">),(</span><span class="mi">3</span><span class="p">))</span> <span class="k">AS</span> <span class="n">v</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="p">);</span>
</code></pre></div></div>

<p>This is better because the <code class="language-plaintext highlighter-rouge">IN</code> list is a big list of scalar expressions, where the <code class="language-plaintext highlighter-rouge">VALUES</code> clause is treated like a relation (or table). This can help with join strategy selection.</p>

<h2 id="a-temporary-table-of-ids">A temporary table of ids</h2>
<p>Yet another option for big lists of values is to put these into a temporary table for the session. The temporary table can even index the ids.</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TEMP</span> <span class="k">TABLE</span> <span class="n">temp_ids</span> <span class="p">(</span><span class="n">author_id</span> <span class="nb">int</span><span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">temp_ids</span><span class="p">(</span><span class="n">author_id</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">1</span><span class="p">),(</span><span class="mi">2</span><span class="p">),(</span><span class="mi">3</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">INDEX</span> <span class="k">ON</span> <span class="n">temp_ids</span><span class="p">(</span><span class="n">author_id</span><span class="p">);</span>

<span class="k">SELECT</span> <span class="n">title</span>
<span class="k">FROM</span> <span class="n">books</span> <span class="n">b</span>
<span class="k">JOIN</span> <span class="n">temp_ids</span> <span class="n">t</span> <span class="k">ON</span> <span class="n">t</span><span class="p">.</span><span class="n">author_id</span> <span class="o">=</span> <span class="n">b</span><span class="p">.</span><span class="n">author_id</span><span class="p">;</span>
</code></pre></div></div>

<h2 id="using-any-and-an-array-of-values">Using ANY and an ARRAY of values</h2>
<p>Another form is using <code class="language-plaintext highlighter-rouge">ANY</code> with an ARRAY:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">title</span>
<span class="k">FROM</span> <span class="n">books</span>
<span class="k">WHERE</span> <span class="n">author_id</span> <span class="o">=</span> <span class="k">ANY</span> <span class="p">(</span><span class="n">ARRAY</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]);</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">ANY</code> form can perform better. With an <code class="language-plaintext highlighter-rouge">IN</code> list, the values are parsed like a chain of OR operations, with the planner handling one branch at a time.</p>

<p><code class="language-plaintext highlighter-rouge">ANY</code> is treated like a single functional expression.</p>

<p>This form also supports prepared statements. With prepared statements, the statement is parsed and planned once and then can be reused.</p>

<p>Here’s an example of fetching books by author:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">PREPARE</span> <span class="n">get_books_by_author</span><span class="p">(</span><span class="nb">int</span><span class="p">[])</span> <span class="k">AS</span>
<span class="k">SELECT</span> <span class="n">title</span>
<span class="k">FROM</span> <span class="n">books</span>
<span class="k">WHERE</span> <span class="n">author_id</span> <span class="o">=</span> <span class="k">ANY</span> <span class="p">(</span><span class="err">$</span><span class="mi">1</span><span class="p">);</span>

<span class="k">EXECUTE</span> <span class="n">get_books_by_author</span><span class="p">(</span><span class="n">ARRAY</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">]);</span>
</code></pre></div></div>

<h2 id="testing-the-alternative-query-structures">Testing the alternative query structures</h2>
<p>Unfortunately generic guidelines here won’t guarantee success in your specific database. Row counts, data distributions, cardinality, or correlation are just some of the factors that affect query execution.</p>

<p>My recommended process is to test on production-like data, work in the SQL layer, then try out restructured queries using these tactics, and study their query execution plans collected using <code class="language-plaintext highlighter-rouge">EXPLAIN (ANALYZE, BUFFERS)</code>.</p>

<p>Query plan collection and analysis is outside the scope of this post, but in brief, you’ll want to compare the plans and look to access fewer buffers, at lower costs, with fewer rows evaluated, fewer loops, for more efficient execution.</p>

<p>If you’re working in Active Record, you’d then translate your SQL back into the Active Record source code location where the queries were generated.</p>

<p>How do we find problematic <code class="language-plaintext highlighter-rouge">IN</code> queries that ran earlier in Postgres?</p>

<h2 id="finding-in-clause-queries-in-pg_stat_statements">Finding IN clause queries in pg_stat_statements</h2>
<p>To find out if your query stats include the problematic <code class="language-plaintext highlighter-rouge">IN</code> queries, let’s search the results of <code class="language-plaintext highlighter-rouge">pg_stat_statements</code> by querying the <code class="language-plaintext highlighter-rouge">query</code> field.</p>

<p>Unfortunately these don’t always group up well, so there can be duplicates or near-duplicates. You may have lots of PGSS results to sift through.</p>

<p>Here’s a basic query to filter on <code class="language-plaintext highlighter-rouge">query</code> for <code class="language-plaintext highlighter-rouge">'%IN \(%'</code>:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span>
    <span class="n">query</span>
<span class="k">FROM</span>
    <span class="n">pg_stat_statements</span>
<span class="k">WHERE</span>
    <span class="n">query</span> <span class="k">LIKE</span> <span class="s1">'%IN </span><span class="se">\(</span><span class="s1">%'</span><span class="p">;</span>
</code></pre></div></div>
<p>See the <a href="https://github.com/andyatkinson/pg_scripts/pull/16">linked PR</a> for a reproduction set of commands to create these tables, queries, and then inspect the query statistics using PGSS.</p>

<p>While you can find and restructure your queries towards more efficient patterns, are there any changes coming to Postgres itself to better handle these?</p>

<h2 id="improvements-in-postgres-17">Improvements in Postgres 17</h2>
<p>As part of the PostgreSQL 17 release in 2024, the developers made improvements to more efficiently work with scalar expressions and indexes, resulting in fewer repeated scans, and thus faster execution.</p>

<p>This reduces latency by reducing IO, and the benefits are available to all Postgres users without the need to change their SQL queries or ORM code!</p>

<h2 id="grouping-similar-query-groups-in-pg_stat_statements">Grouping similar query groups in pg_stat_statements</h2>
<p>There are more usability improvements coming for Postgres users, pg_stat_statements, and <code class="language-plaintext highlighter-rouge">IN</code> clause queries.</p>

<p>One problem with these has been that similar entries aren’t collapsed together when they have a different numbers of scalar array expressions.</p>

<p>For example <code class="language-plaintext highlighter-rouge">IN ('1')</code> was not grouped with <code class="language-plaintext highlighter-rouge">IN ('1','2')</code>. Having the statistics for nearly identical entries split across multiple results makes them less useful.</p>

<p>Fortunately, fixes are coming. On the Ruby on Rails side, Sean Linsley is working on a fix by replacing the use of <code class="language-plaintext highlighter-rouge">IN</code> with <code class="language-plaintext highlighter-rouge">ANY</code> which solves the grouping problem.</p>

<p>Here’s the PR: <a href="https://github.com/rails/rails/pull/49388#issuecomment-2680362607">https://github.com/rails/rails/pull/49388#issuecomment-2680362607</a></p>

<p>On the PostgreSQL side, there are fixes coming for PostgreSQL 18.</p>

<h2 id="improvements-in-postgresql-18">Improvements in PostgreSQL 18</h2>
<p>Related improvements are coming to PostgreSQL 18 for 2025.</p>

<p>This commit<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> implements the automatic conversion of <code class="language-plaintext highlighter-rouge">x IN (VALUES ...)</code> into ScalarArrayOpExpr.</p>

<p>Another noteworthy commit is: “Squash query list jumbling” from Álvaro Herrera.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>

<p>pg_stat_statements produces multiple entries for queries like <code class="language-plaintext highlighter-rouge">SELECT something FROM table WHERE col IN (1, 2, 3, ...)</code> depending on the number of parameters, because every element of ArrayExpr is individually jumbled.
Most of the time that’s undesirable, especially if the list becomes too large.</p>

<p>This commit<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> mentions the original design was for a GUC query_id_squash_values, but that was removed in favor of making this the default behavior.</p>

<h2 id="conclusion">Conclusion</h2>
<p>In this post, we looked at a problematic query pattern, big <code class="language-plaintext highlighter-rouge">IN</code> lists. You may have instances of this pattern in your codebase from direct means or from using some ORM methods.</p>

<p>This type of query performs poorly for big lists of values, as they take more resources to parse, plan, and execute. There are fewer indexing options compared with an alternative structured as a join operation. Join queries provide two sets of table statistics from both tables being joined, that help with query planning.</p>

<p>We learned how to find instances of these using pg_stat_statements for PostgreSQL. The post then considers several alternatives.</p>

<p>Our main tactics are to convert these queries to joins when possible. Outside of that, we could consider using the <code class="language-plaintext highlighter-rouge">ANY</code> operator with an array of values, a <code class="language-plaintext highlighter-rouge">VALUES</code> clause, and consider using a prepared statement.</p>

<p>The next time you see big <code class="language-plaintext highlighter-rouge">IN</code> lists causing database performance problems, hopefully you feel more prepared to restructure and optimize them!</p>

<p>Thanks for reading this post. I’d love to hear about any tips or tricks you have for these types of queries!</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c0962a113d1f2f94cb7222a7ca025a67e9ce3860">https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c0962a113d1f2f94cb7222a7ca025a67e9ce3860</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=62d712ecfd940f60e68bde5b6972b6859937c412">https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=62d712ecfd940f60e68bde5b6972b6859937c412</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fbd53dea5d513a78ca04834101ca1aa73b63e59">https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fbd53dea5d513a78ca04834101ca1aa73b63e59</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Andrew Atkinson</name></author><category term="PostgreSQL" /><category term="Ruby on Rails" /><summary type="html"><![CDATA[Introduction If you’ve created web apps with relational databases and ORMs like Active Record (part of Ruby on Rails), you’ve probably experienced database performance problems after a certain size of data and query volume.]]></summary></entry><entry><title type="html">Short alphanumeric pseudo random identifiers in Postgres</title><link href="https://andyatkinson.com/generating-short-alphanumeric-public-id-postgres" rel="alternate" type="text/html" title="Short alphanumeric pseudo random identifiers in Postgres" /><published>2025-05-20T16:00:00+00:00</published><updated>2025-05-20T16:00:00+00:00</updated><id>https://andyatkinson.com/postgres-short-alphanumeric-public-id</id><content type="html" xml:base="https://andyatkinson.com/generating-short-alphanumeric-public-id-postgres"><![CDATA[<h2 id="introduction">Introduction</h2>
<p>In this post, we’ll cover a way to generate short, alphanumeric, pseudo random identifiers using native Postgres tactics.</p>

<p>These identifiers can be used for things like transactions or reservations, where users need to read and share them easily. This approach is an alternative to using long, random generated values like <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">UUID</a> values, which have downsides for usability and performance.</p>

<p>We’ll call the identifier a <code class="language-plaintext highlighter-rouge">public_id</code> and store it in a column with that name. Here are some example values:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">public_id</span>
<span class="k">FROM</span> <span class="n">transactions</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">random</span><span class="p">()</span>
<span class="k">LIMIT</span> <span class="mi">3</span><span class="p">;</span>

 <span class="n">public_id</span>
<span class="c1">-----------</span>
 <span class="mi">0359</span><span class="n">Y</span>
 <span class="mi">08</span><span class="n">nAS</span>
 <span class="mi">096</span><span class="n">WV</span>
</code></pre></div></div>

<h2 id="natural-and-surrogate-keys">Natural and Surrogate Keys</h2>
<p>In database design, we can use natural or surrogate keys to identify rows. We won’t cover the differences here as that’s out of scope.</p>

<p>For our <code class="language-plaintext highlighter-rouge">public_id</code> identifier, we’re going to generate it from a conventional surrogate <code class="language-plaintext highlighter-rouge">integer</code> primary key called <code class="language-plaintext highlighter-rouge">id</code>. We aren’t using natural keys here.</p>

<p>The <code class="language-plaintext highlighter-rouge">public_id</code> is intended for use outside the database, while the <code class="language-plaintext highlighter-rouge">id</code> <code class="language-plaintext highlighter-rouge">integer</code> primary key is used inside the database to be referenced by foreign key columns on other tables.</p>

<p>Whle <code class="language-plaintext highlighter-rouge">public_id</code> is short which minimizes space and speeds up access, the main reason for it is for usability.</p>

<p>With that said, the target for total space consumption was to be fewer bytes than a 16-byte UUID. This was achieved with an <code class="language-plaintext highlighter-rouge">integer</code> primary key and this additional 5 character generated value, targeting a smaller database where this provides plenty of unique values now and into the future.</p>

<p>Let’s get into the design details.</p>

<h2 id="design-properties">Design Properties</h2>
<p>Here were the desired design properties:</p>

<ul>
  <li>A fixed size, 5 characters in length, regardless of the size of the input integer (and within the range of the <code class="language-plaintext highlighter-rouge">integer</code> data type)</li>
  <li>Fewer bytes of space than a <code class="language-plaintext highlighter-rouge">uuid</code> data type</li>
  <li>An obfuscated value, pseudo random, not easily guessable. While not easily guessable, this is not meant to be “secure”</li>
  <li>Reversibility back into the original integer</li>
  <li>Only native Postgres capabilities, no extensions, client web app language can be anything as it’s within Postgres</li>
  <li>Non math-heavy implementation</li>
</ul>

<p>Additional details:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">public_id</code> is stored using <code class="language-plaintext highlighter-rouge">text</code> not not <code class="language-plaintext highlighter-rouge">char(5)</code>, following recommendations for best practices</li>
  <li>PL/PgSQL functions, native Postgres data types and constraints are used, like UNIQUE, NOT NULL, and CHECK, and a stored generated column.</li>
  <li>Converts integers to bits, uses exclusive-or (XOR) bitwise operation and modulo operations.</li>
</ul>

<h2 id="limitations">Limitations</h2>
<ul>
  <li>Did not set out to support case insensitivity now, possible future enhancement</li>
  <li>Did not try to exclude similar-looking characters (see: <a href="https://www.crockford.com/base32.html">Base32 Crockford</a> below), possible future enhancement</li>
</ul>

<h2 id="plpgsql-functions">PL/PgSQL Functions</h2>
<p>Here are the functions used:</p>

<p>This function obfuscates the integer value using exclusive or (XOR) obfuscation.</p>
<ul>
  <li>Uses a Hexadecimal key <code class="language-plaintext highlighter-rouge">0x5A3C1</code> (make this any key you want)</li>
  <li>Sets a max value for the data type range <code class="language-plaintext highlighter-rouge">62^5</code>, which is just under 1 billion possible values. This was enough for this system and into the future, but a bigger system would want to use <code class="language-plaintext highlighter-rouge">bigint</code></li>
  <li>Converts integer bytes into bits</li>
</ul>

<p>Main entrypoint function:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">obfuscate_id(id INTEGER)</code></li>
</ul>

<p>This converts the obfuscated value into the <code class="language-plaintext highlighter-rouge">public_id</code> alphanumeric value, used within <code class="language-plaintext highlighter-rouge">obfuscate_id()</code>.</p>

<p>This is “base 62” with the 26 upper and lower case characters, and 10 numbers (0-9).</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">to_base62_fixed(val BIGINT, width INT DEFAULT 5)</code></li>
</ul>

<p>Reverses the <code class="language-plaintext highlighter-rouge">public_id</code> back into the original integer.</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">deobfuscate_id(public_id TEXT)</code></li>
</ul>

<p>Used within <code class="language-plaintext highlighter-rouge">deobfuscate_id()</code>:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">from_base62_fixed(str TEXT)</code></li>
</ul>

<p>For a length of 5 with this system, we can create up to around ~1 billion unique values. This was sufficiently large for the original use case.</p>

<p>For use cases requiring more values, by storing 6 characters for <code class="language-plaintext highlighter-rouge">public_id</code> then up to ~56 billion values could be generated, based on a <code class="language-plaintext highlighter-rouge">bigint</code> primary key.</p>

<h2 id="table-design">Table Design</h2>
<p>Let’s create a sample <code class="language-plaintext highlighter-rouge">transactions</code> table with an <code class="language-plaintext highlighter-rouge">integer</code> primary key with a generated identity column.</p>

<p>Besides the use in the identity column, we’ll again use the <code class="language-plaintext highlighter-rouge">GENERATED</code> keyword to create a <code class="language-plaintext highlighter-rouge">STORED</code> column for the <code class="language-plaintext highlighter-rouge">public_id</code>.</p>

<p>The <code class="language-plaintext highlighter-rouge">public_id</code> column uses the <code class="language-plaintext highlighter-rouge">id</code> column as input, obfuscates it, encodes it to base 62, producing a 5 character value.</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">DROP</span> <span class="k">TABLE</span> <span class="n">IF</span> <span class="k">EXISTS</span> <span class="n">transactions</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">transactions</span> <span class="p">(</span>
  <span class="n">id</span> <span class="nb">INTEGER</span> <span class="k">GENERATED</span> <span class="n">ALWAYS</span> <span class="k">AS</span> <span class="k">IDENTITY</span> <span class="k">PRIMARY</span> <span class="k">KEY</span><span class="p">,</span>  <span class="c1">-- 4-byte integer ID</span>
  <span class="n">public_id</span> <span class="nb">text</span> <span class="k">GENERATED</span> <span class="n">ALWAYS</span> <span class="k">AS</span> <span class="p">(</span><span class="n">obfuscate_id</span><span class="p">(</span><span class="n">id</span><span class="p">))</span> <span class="n">STORED</span> <span class="k">UNIQUE</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> <span class="c1">-- 5-character obfuscated Base62 value</span>
  <span class="n">amount</span> <span class="nb">NUMERIC</span><span class="p">,</span>
  <span class="n">description</span> <span class="nb">TEXT</span>
<span class="p">);</span>
</code></pre></div></div>

<p>How do we guarantee <code class="language-plaintext highlighter-rouge">public_id</code> conforms to our expected data properties? Constraints!</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">public_id</code> gets a <code class="language-plaintext highlighter-rouge">UNIQUE</code> constraint and <code class="language-plaintext highlighter-rouge">NOT NULL</code>, so we know we have a unique value</li>
  <li>A <code class="language-plaintext highlighter-rouge">CHECK</code> constraint is added to validate the length</li>
</ul>

<p>For an existing system, we could add a unique index <code class="language-plaintext highlighter-rouge">CONCURRENTLY</code> first as follows:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">UNIQUE</span> <span class="k">INDEX</span> <span class="n">CONCURRENTLY</span> <span class="n">IF</span> <span class="k">NOT</span> <span class="k">EXISTS</span> <span class="n">idx_uniq_pub_id</span> <span class="k">ON</span> <span class="n">transactions</span> <span class="p">(</span><span class="n">public_id</span><span class="p">);</span>
</code></pre></div></div>

<p>Then we can add the unique constraint using the unique index, along with the <code class="language-plaintext highlighter-rouge">CHECK</code> constraint:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">transactions</span>
    <span class="k">ADD</span> <span class="k">CONSTRAINT</span> <span class="n">uniq_pub_id</span> <span class="k">UNIQUE</span> <span class="k">USING</span> <span class="k">INDEX</span> <span class="n">idx_uniq_pub_id</span><span class="p">,</span> <span class="c1">-- depends on index above</span>
    <span class="k">ADD</span> <span class="k">CONSTRAINT</span> <span class="n">public_id_length</span> <span class="k">CHECK</span> <span class="p">(</span><span class="k">LENGTH</span><span class="p">(</span><span class="n">public_id</span><span class="p">)</span> <span class="o">&lt;=</span> <span class="mi">5</span><span class="p">);</span>
</code></pre></div></div>

<h2 id="insert-data">Insert Data</h2>
<p>Let’s insert data into the <code class="language-plaintext highlighter-rouge">transactions</code> table:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">transactions</span> <span class="p">(</span><span class="n">amount</span><span class="p">,</span> <span class="n">description</span><span class="p">)</span> <span class="k">VALUES</span>
  <span class="p">(</span><span class="mi">100</span><span class="p">.</span><span class="mi">00</span><span class="p">,</span> <span class="s1">'First transaction'</span><span class="p">),</span>
  <span class="p">(</span><span class="mi">50</span><span class="p">.</span><span class="mi">00</span><span class="p">,</span> <span class="s1">'Second transaction'</span><span class="p">),</span>
  <span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">25</span><span class="p">,</span> <span class="s1">'Third transaction'</span><span class="p">);</span>
</code></pre></div></div>

<p>Let’s query the data, and also make sure it’s reversed (using the <code class="language-plaintext highlighter-rouge">deobfuscate_id(public_id TEXT)</code> function) properly:</p>

<h2 id="access-the-rows">Access the rows</h2>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span>
    <span class="n">id</span><span class="p">,</span>
    <span class="n">public_id</span><span class="p">,</span>
    <span class="n">deobfuscate_id</span><span class="p">(</span><span class="n">public_id</span><span class="p">)</span> <span class="k">AS</span> <span class="n">reversed_id</span><span class="p">,</span>
    <span class="n">description</span>
<span class="k">FROM</span>
    <span class="n">transactions</span><span class="p">;</span>


 <span class="n">id</span> <span class="o">|</span> <span class="n">public_id</span> <span class="o">|</span> <span class="n">reversed_id</span> <span class="o">|</span>    <span class="n">description</span>
<span class="c1">----+-----------+-------------+--------------------</span>
  <span class="mi">1</span> <span class="o">|</span> <span class="mi">01</span><span class="n">Y9I</span>     <span class="o">|</span>           <span class="mi">1</span> <span class="o">|</span> <span class="k">First</span> <span class="n">transaction</span>
  <span class="mi">2</span> <span class="o">|</span> <span class="mi">01</span><span class="n">Y9L</span>     <span class="o">|</span>           <span class="mi">2</span> <span class="o">|</span> <span class="k">Second</span> <span class="n">transaction</span>
  <span class="mi">3</span> <span class="o">|</span> <span class="mi">01</span><span class="n">Y9K</span>     <span class="o">|</span>           <span class="mi">3</span> <span class="o">|</span> <span class="n">Third</span> <span class="n">transaction</span>
</code></pre></div></div>

<h2 id="additional-time-spent-on-inserts">Additional time spent on inserts</h2>
<p>Let’s compare the time spent inserting 1 million rows into an equivalent <code class="language-plaintext highlighter-rouge">transactions</code> table without the <code class="language-plaintext highlighter-rouge">public_id</code> column or value generation.</p>

<p>That took an average of 2037.906 milliseconds, or around 2 seconds on my machine.</p>

<p>Inserting 1 million rows with the <code class="language-plaintext highlighter-rouge">public_id</code> took an average of 6954.070 or around 7 seconds, or about 3.41x slower. Note that these times were with the indexes and constraints in place on the <code class="language-plaintext highlighter-rouge">transactions</code> table in the second example, but not the first, meaning their presence contributed to the total time.</p>

<p>Summary: Creating this identifier made the write operations 3.4x slower for me locally, which was an acceptable amount of overhead for the intended use case.</p>

<h2 id="performance">Performance</h2>
<p>Compared with random values, the pseudo random <code class="language-plaintext highlighter-rouge">public_id</code> remains orderable, which means that lookups for individual rows or ranges of rows can use indexes, running fast and reliably even as row counts grow.</p>

<p>We can add a unique index on the <code class="language-plaintext highlighter-rouge">public_id</code> column like this:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">UNIQUE</span> <span class="k">INDEX</span> <span class="n">CONCURRENTLY</span>
<span class="n">IF</span> <span class="k">NOT</span> <span class="k">EXISTS</span>
<span class="n">idx_uniq_pub_id</span> <span class="k">ON</span> <span class="n">transactions</span> <span class="p">(</span><span class="n">public_id</span><span class="p">);</span>
</code></pre></div></div>

<p>We can very that individual lookups or range scans use this index, by inspecting query execution plans for this table.</p>

<h2 id="plpgsql-source-code">PL/pgSQL Source Code</h2>
<p><a href="https://github.com/andyatkinson/pg_scripts/pull/15">https://github.com/andyatkinson/pg_scripts/pull/15</a></p>

<h2 id="feedback">Feedback</h2>
<p>Feedback on this approach is welcomed! Please use my contact form to provide feedback or leave comments on the PR.</p>

<p>Future enhancements to this could include unit tests using <a href="https://pgtap.org">pgTAP</a> for the functions, packaging them into an extension, or supporting more features like case insensitivity or a modified input alphabet.</p>

<p>Thanks for reading!</p>

<h2 id="alternatives">Alternatives</h2>
<ul>
  <li><a href="https://www.crockford.com/base32.html">Base32 Crockford</a> - An emphasis ease of use for humans: removing similar looking characters, case insensitivity.</li>
  <li><a href="https://blog.lawrencejones.dev/ulid/">ULID</a> - Also 128 bits/8 bytes like UUIDs, so I had ruled these out for space consumption, and they’re slightly less “usable”</li>
  <li><a href="https://planetscale.com/blog/why-we-chose-nanoids-for-planetscales-api">NanoIDs at PlanetScale</a> - I like aspects of NanoID. This is random generation though like UUID vs. encoding a unique integer.</li>
</ul>]]></content><author><name>Andrew Atkinson</name></author><category term="PostgreSQL" /><summary type="html"><![CDATA[Introduction In this post, we’ll cover a way to generate short, alphanumeric, pseudo random identifiers using native Postgres tactics.]]></summary></entry><entry><title type="html">Source code locations for database queries in Rails with Marginalia and Query Logs</title><link href="https://andyatkinson.com/source-code-line-numbers-ruby-on-rails-marginalia-query-logs" rel="alternate" type="text/html" title="Source code locations for database queries in Rails with Marginalia and Query Logs" /><published>2025-04-29T00:00:00+00:00</published><updated>2025-04-29T00:00:00+00:00</updated><id>https://andyatkinson.com/source-code-line-numbers-ruby-on-rails-marginalia-query-logs</id><content type="html" xml:base="https://andyatkinson.com/source-code-line-numbers-ruby-on-rails-marginalia-query-logs"><![CDATA[<h2 id="intro">Intro</h2>
<p>Back in 2022, we covered how to log database query generation information from a web app using <code class="language-plaintext highlighter-rouge">pg_stat_statements</code> for Postgres.
<a href="https://andyatkinson.com/blog/2022/10/07/pgsqlphriday-2-truths-lie">https://andyatkinson.com/blog/2022/10/07/pgsqlphriday-2-truths-lie</a></p>

<p>The application context annotations can look like this. They’ve been re-formatted for printing:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>application=Rideshare
controller=trip_requests
action=create
</code></pre></div></div>

<p>I use <code class="language-plaintext highlighter-rouge">pg_stat_statements</code> to identify costly queries generated in the web application, often ORM queries (the ORM is Active Record in Ruby on Rails), with the goal of working on efficiency and performance improvements.</p>

<p>The annotations above are included in the <code class="language-plaintext highlighter-rouge">query</code> field and formatted as SQL-compatible comments.</p>

<p>Application context usually includes the app name and app concepts like MVC controller names, action names, or even more precise info which we’ll cover next.</p>

<p>How can we make these even more useful?</p>

<h2 id="whats-the-mechanism-to-generate-these-annotations">What’s the mechanism to generate these annotations?</h2>
<p>For Ruby on Rails, we’ve used the <a href="https://github.com/basecamp/marginalia">Marginalia</a> Ruby gem to create these annotations.</p>

<p>Besides the context above, a super useful option is the <code class="language-plaintext highlighter-rouge">:line</code> option which captures the source code file and line number.</p>

<p>Given how dynamic Ruby code can be, including changes that can happen at runtime, the <code class="language-plaintext highlighter-rouge">:line</code> level logging takes these annotations from “nice to have” to “critical” to find opportunities for improvements.</p>

<p>What’s more, is that besides Marginalia, we now have a second option that’s built-in to Ruby on Rails.</p>

<h2 id="whats-been-added-since-then">What’s been added since then?</h2>
<p>In Rails 7.1, Ruby on Rails gained similar functionality to Marginalia directly in the framework.</p>

<p>While nice to have directly in the framework, the initial version didn’t have the source code line-level capability.</p>

<p>That changed in the last year! Starting from PR 50969 to Rails linked below, for Rails 7.2.0 and 8.0.2, the <code class="language-plaintext highlighter-rouge">source_location</code> option was added to <a href="https://api.rubyonrails.org/classes/ActiveRecord/QueryLogs.html">Active Record Query Logs</a>, equivalent to the <code class="language-plaintext highlighter-rouge">:line</code> option in Marginalia.</p>

<p>PR: Support <code class="language-plaintext highlighter-rouge">:source_location</code> tag option for query log tags by <a href="https://github.com/fatkodima">fatkodima</a>
<a href="https://github.com/rails/rails/pull/50969#issuecomment-2797357558">https://github.com/rails/rails/pull/50969#issuecomment-2797357558</a></p>

<p>An example of <code class="language-plaintext highlighter-rouge">:source_location</code> in action looks like this:</p>

<pre><code>
application=Rideshare
controller=trip_requests
➡️ <strong>source_location=app/services/trip_creator.rb:26:<br />in `best_available_driver'</strong>
action=create
</code></pre>

<p>Nice, now we’ve got the class name, line number, and Ruby method. In this example, we can get to work optimizing the <code class="language-plaintext highlighter-rouge">best_available_driver</code> method.</p>

<p>Dima described how the Marginalia <code class="language-plaintext highlighter-rouge">:line</code> option was costly in production and even managed to improve that with the Query Logs change.</p>

<h2 id="safe-logging-locally-or-in-production">Safe Logging Locally or in Production</h2>
<p>If you’re unsure about source code line logging in production, but want to get started using it, a great place to start is using it in your local development enrivonment.</p>

<p>To avoid enabling the option for all environments, we’ll use an environment variable that’s enabled only for local development.</p>

<p>Here’s a real example I use for Marginalia:
<code class="language-plaintext highlighter-rouge">MARGINALIA_LINE_NUMBER_ENABLED=true</code></p>

<p>In <code class="language-plaintext highlighter-rouge">config/initializers/marginalia.rb</code>:</p>
<div class="language-rb highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Marginalia</span><span class="o">::</span><span class="no">Comment</span><span class="p">.</span><span class="nf">components</span> <span class="o">=</span> <span class="p">[</span>
  <span class="ss">:application</span><span class="p">,</span>
  <span class="ss">:controller</span><span class="p">,</span>
  <span class="ss">:action</span>
<span class="p">]</span>

<span class="k">if</span> <span class="no">ENV</span><span class="p">[</span><span class="s1">'MARGINALIA_LINE_NUMBER_ENABLED'</span><span class="p">]</span>
  <span class="no">Marginalia</span><span class="o">::</span><span class="no">Comment</span><span class="p">.</span><span class="nf">components</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="ss">:line</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>For Query Logs, in <code class="language-plaintext highlighter-rouge">config/application.rb</code> (adjust to be for the environments you prefer), the equivalent could look like this:</p>
<div class="language-rb highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">config</span><span class="p">.</span><span class="nf">active_record</span><span class="p">.</span><span class="nf">query_log_tags_enabled</span> <span class="o">=</span> <span class="kp">true</span>
<span class="n">config</span><span class="p">.</span><span class="nf">active_record</span><span class="p">.</span><span class="nf">query_log_tags</span> <span class="o">=</span> <span class="p">[</span>
  <span class="ss">:application</span><span class="p">,</span>
  <span class="ss">:controller</span><span class="p">,</span>
  <span class="ss">:action</span><span class="p">,</span>
  <span class="ss">:source_location</span>
<span class="p">]</span>
</code></pre></div></div>

<p>The configuration above was tested with Rails 7.2.2.</p>

<p>If your team uses Query Logs <code class="language-plaintext highlighter-rouge">:source_location</code> in development or production, I’d love to know!</p>

<h2 id="wrap-up">Wrap Up</h2>
<p>Having source code line-level logging for query statistics is critical information that allows backend engineers to quickly zero in on where to fix database performance issues.</p>

<p>With this combo of information, engineers can identify the most heavy queries, then go backwards into the source code to know where to redesign, refactor, or restructure, or even remove costly queries.</p>

<h2 id="whats-next">What’s next?</h2>
<p>The <code class="language-plaintext highlighter-rouge">pg_stat_statements</code> extension is critical for this workflow, but it’s not without opportunities for improvement.</p>

<p>One issue that <code class="language-plaintext highlighter-rouge">pg_stat_statements</code> has is that many of the entries can be duplicates or near-duplicates, making it tougher to sift through.</p>

<p>Fortunately, fixes are coming for that too! Stay tuned for a future post where we’ll cover upcoming improvements in future versions of Postgres that will help de-duplicate <code class="language-plaintext highlighter-rouge">pg_stat_statements</code> entries, as well as options to achieve that with Ruby on Rails even for older versions of Postgres.</p>

<p>Consider subscribing to my newsletter, where I send out occasional issues linking to blog posts, conferences, industry news, and more, so you don’t miss that post!</p>

<p>Thanks for reading.</p>]]></content><author><name>Andrew Atkinson</name></author><category term="Ruby on Rails" /><category term="Ruby" /><category term="PostgreSQL" /><summary type="html"><![CDATA[Intro Back in 2022, we covered how to log database query generation information from a web app using pg_stat_statements for Postgres. https://andyatkinson.com/blog/2022/10/07/pgsqlphriday-2-truths-lie]]></summary></entry><entry><title type="html">🎙️ Talking Postgres Podcast: Helping Rails developers learn Postgres with Andrew Atkinson</title><link href="https://andyatkinson.com/helping-rails-developers-learn-postgres-with-andrew-atkinson" rel="alternate" type="text/html" title="🎙️ Talking Postgres Podcast: Helping Rails developers learn Postgres with Andrew Atkinson" /><published>2025-04-07T00:00:00+00:00</published><updated>2025-04-07T00:00:00+00:00</updated><id>https://andyatkinson.com/helping-rails-developers-learn-postgres-with-andrew-atkinson</id><content type="html" xml:base="https://andyatkinson.com/helping-rails-developers-learn-postgres-with-andrew-atkinson"><![CDATA[<p>Back in November, I met with Claire Giordano, host of the Talking Postgres podcast, who asked a ton of great questions about my experience writing a Postgres book aimed at Ruby on Rails web developers.</p>

<h2 id="some-questions">Some questions</h2>
<p>Claire had a lot of thoughtful questions. Here’s a few:</p>

<ul>
  <li>Why write a technical book? Was there some moment or spark?</li>
  <li>Why write a book about Postgres?</li>
  <li>Why Ruby on Rails?</li>
</ul>

<h2 id="fun-topics">Fun topics</h2>
<p>Claire also brought up a lot of fun points and reactions. Here’s a sample:</p>

<ul>
  <li>The importance planting seeds and encouraging others with ambitious projects</li>
  <li>Would I consider writing a book for Django and Python for Postgres?</li>
  <li>Where does the book fit in the landscape?</li>
  <li>How long did it take to write this book?</li>
  <li>Did I ever want to quit writing, even for a moment?</li>
  <li>Did I have a party when the book was fully complete?</li>
  <li>I talked about “little parties” with Rails developer communities at events like Rails World and Sin City Ruby</li>
  <li>What was my experience like in working with other publishers</li>
  <li>I shared my deep appreciation for the efforts of the technical reviewers of the book!</li>
  <li>We talked about cheese! 🧀 (stories and connections with Postgres icons David Rowley and Melanie Plageman)</li>
  <li>What was my favorite chapter?</li>
  <li>Is there a frequently asked question I get about databases from Rails developers?</li>
  <li>For my consulting services, do clients hire me for my Rails expertise or my Postgres expertise?</li>
</ul>

<p><a href="https://www.goodreads.com/quotes/320581">Quote</a> mentioned by Claire:</p>

<blockquote>
  <p>Writing is thinking. To write well is to think clearly. That’s why it’s so hard.
<br />
—David McCullough</p>
</blockquote>

<!-- Callout box -->
<section>
<div style="border-radius:0.8em;background-color:#eee;padding:1em;margin:1em;color:#000;">
<h2>Podcast</h2>
<p>👉 <a href="https://talkingpostgres.com/episodes/helping-rails-developers-learn-postgres-with-andrew-atkinson/transcript">Helping Rails developers learn Postgres with Andrew Atkinson</a></p>
</div>
</section>

<p>It was a real honor to be a guest on this prestigious podcast. I’m lucky to call Claire a friend as well! Thank you for the opportunity Claire, Aaron, and team!</p>

<p>Check out more episodes of <a href="https://talkingpostgres.com">Talking Postgres</a>!</p>]]></content><author><name>Andrew Atkinson</name></author><category term="Podcasts" /><category term="PostgreSQL" /><category term="Ruby on Rails" /><summary type="html"><![CDATA[Back in November, I met with Claire Giordano, host of the Talking Postgres podcast, who asked a ton of great questions about my experience writing a Postgres book aimed at Ruby on Rails web developers.]]></summary></entry><entry><title type="html">Django and Postgres for the Busy Rails Developer</title><link href="https://andyatkinson.com/django-python-postgres-busy-rails-developer" rel="alternate" type="text/html" title="Django and Postgres for the Busy Rails Developer" /><published>2024-12-10T00:00:00+00:00</published><updated>2024-12-10T00:00:00+00:00</updated><id>https://andyatkinson.com/django-python-postgres-busy-rails-developer</id><content type="html" xml:base="https://andyatkinson.com/django-python-postgres-busy-rails-developer"><![CDATA[<p>About 10 years ago I wrote a post <a href="/blog/2014/01/02/postgres-for-the-busy-mysql-developer">PostgreSQL for the Busy MySQL Developer</a>, as part of switching from MySQL to Postgres for my personal and professional projects wherever I could.</p>

<p>Recently I had the chance to work with Python, Django, and Postgres as a long-time and busy Rails developer.</p>

<p>There were some things I thought were really nice. So am I switching?</p>

<p>The team I worked with was experienced with Django so I was curious to learn from them which libraries and tools are popular, and how how to write idiomatic code.</p>

<p>In this post I’ll briefly cover the database parts of Django using Postgres (of course!), highlight libraries and tools, and compare aspects to the Ruby on Rails framework. You’ll find a small Django repo towards the end as well.</p>

<h2 id="ruby-versus-python">Ruby versus Python</h2>
<p>Ruby and Python are both general purpose programming languages. On the similarity side, they can both be used to write script style code, or organize code into classes using object oriented paradigms.</p>

<p>In local development, it felt like the execution of Python was perhaps faster than Ruby, however I’ve noticed that new apps are always fast to work with, given how little code is being loaded and executed.</p>

<h2 id="language-runtime-management">Language runtime management</h2>
<p>As a developer we typically need to run multiple versions of Ruby, Python, Node, and other runtimes, to support different codebases, and to avoid modifying our system installation.</p>

<p>In Ruby I use <a href="https://github.com/rbenv/rbenv">rbenv</a> to manage multiple versions of Ruby, and to avoid using the version of Ruby that was installed by macOS, which is usually outdated compared with the version I want for a new app.</p>

<p>In Python, I used <a href="https://github.com/pyenv/pyenv">pyenv</a> to accomplish the same thing, which seemed quite similar in use.</p>

<p>Both have concepts of a local and global version, and roughly similar commands to install and change versions.</p>

<h2 id="library-management">Library management</h2>
<p>In Ruby on Rails, <a href="https://bundler.io">Bundler</a> has been the de facto standard forever, as a way to pull in Ruby library code and make sure it’s loaded and accessible in the Rails application.</p>

<p>In Python, the team selected the <a href="https://python-poetry.org">poetry</a> dependency management tool.</p>

<p>Commands are similar to Bundler commands, for example <code class="language-plaintext highlighter-rouge">poetry install</code> is about the same as <code class="language-plaintext highlighter-rouge">bundle install</code>.</p>

<p>Dependencies can be expressed in a <code class="language-plaintext highlighter-rouge">pyproject.toml</code> file and poetry creates a lock file with specific library versions. <a href="https://toml.io/en/">TOML</a> and YAML are similar.</p>

<h2 id="linting-and-formatting">Linting and formatting</h2>
<p>In Ruby on Rails, although I personally resisted rule detection etc. for years, <a href="https://github.com/rubocop/rubocop">Rubocop</a> has become the standard, even being built in to the most recent Rails version 8.</p>

<p>Rubocop has configurable rules that can automatically reformat code or lint code for issues.</p>

<p>Formatters like <a href="https://github.com/standardrb/standard">standardrb</a> are commonly used as well.</p>

<p>For the Django app the team selected <a href="https://github.com/astral-sh/ruff">ruff</a>, which performed formatting of code and linting for issues like missing imports.</p>

<p>I found ruff fast and easy to use and genuinely helpful.</p>

<p>For example, sometimes I’d fire up a Django shell, having skipped running ruff, only to realize there are issues it would have caught.</p>

<p>On this small codebase, ruff ran instantly, so it was a no-brainer to run regularly, or even include in my code editor.</p>

<h2 id="postgres-adapter">Postgres adapter</h2>
<p>In Rails and Django, SQLite is the default database, however I wanted to use Postgres.</p>

<p>In Ruby, we have the <a href="https://github.com/ged/ruby-pg">pg gem</a> which connects the application to Postgres as a driver. This does work at a lower level than the application like sending TCP requests, mapping Postgres query result responses into Ruby data types, and much more.</p>

<p>In Python, we used the <a href="https://pypi.org/project/psycopg2/">psycopg2 library</a> and I found it pretty easy to use.</p>

<p>Besides being used by the framework ORM, I created a wrapper class using psycopg2 to use for sending SQL queries outside of models.</p>

<p>For example, we inspected Postgres system catalog views to capture certain data as part of the product features.</p>

<h2 id="migrations-in-rails">Migrations in Rails</h2>
<p>Both Ruby on Rails and Django have the concept of <a href="https://guides.rubyonrails.org/active_record_migrations.html">Migrations</a>, which are Ruby or Python code files that describe a database structure change, and have a version.</p>

<p>From the Ruby or Python code files, SQL DDL (or DML) statements are generated which are run against the configured database.</p>

<p>For example, to add a table in Rails typically a developer uses the <code class="language-plaintext highlighter-rouge">create_table</code> Ruby helper as opposed to writing a <code class="language-plaintext highlighter-rouge">CREATE TABLE</code> SQL statement.</p>

<p>Adding or dropping an index or modifying a column type are other types of DDL statements that typically are performed via migrations.</p>

<h2 id="migrations-in-django">Migrations in Django</h2>
<p>The Django approach has noteworthy differences and a slightly different workflow that I enjoyed more in some ways.</p>

<p>For example, changes are started in a <code class="language-plaintext highlighter-rouge">models.py</code> file, which contains all the application models (multiple models in a single file), and the database layer details about each model attribute.</p>

<p>This means that we specify database data types for columns, whether fields are unique, indexed, and more in the models file.</p>

<p>The interesting difference compared with Rails is that the next step in Django is to run <code class="language-plaintext highlighter-rouge">makemigrations</code>, which <em>generates</em> Python migration files.</p>

<p>This is different from Rails, where Rails developers would first generate a migration file to place changes into.</p>

<p>In Django the generated migration file can be inspected or simply applied using the <code class="language-plaintext highlighter-rouge">migrate</code> command. This command is nearly identical to the Rails equivalent command <code class="language-plaintext highlighter-rouge">db:migrate</code>.</p>

<p>For a new project where we were rapidly iterating on the models and their attributes, I preferred the way Django worked to how Rails works, or found it at least as productive.</p>

<h2 id="command-line-vibes">Command line vibes</h2>
<p>Here are some commands like running <code class="language-plaintext highlighter-rouge">poetry install</code>, or running <code class="language-plaintext highlighter-rouge">manage.py</code> commands like <code class="language-plaintext highlighter-rouge">shell</code> or <code class="language-plaintext highlighter-rouge">makemigrations</code>, to give you a flavor.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">poetry</span> <span class="n">install</span>
<span class="n">python</span> <span class="n">manage</span><span class="p">.</span><span class="n">py</span> <span class="n">dbshell</span> <span class="c1"># psql in postgres
</span><span class="n">python</span> <span class="n">manage</span><span class="p">.</span><span class="n">py</span> <span class="n">shell</span>   <span class="c1"># Django shell
</span><span class="n">python</span> <span class="n">manage</span><span class="p">.</span><span class="n">py</span> <span class="n">makemigrations</span>  <span class="c1"># Generates Python migration files
</span><span class="n">python</span> <span class="n">manage</span><span class="p">.</span><span class="n">py</span> <span class="n">migrate</span> <span class="c1"># Runs migration files
</span></code></pre></div></div>

<h2 id="interactive-console-repl">Interactive console (REPL)</h2>
<p>Both Django and Rails use interpreted languages, Python and Ruby respectively, that each support an interactive execution environment.</p>

<p>This environment is called a <em>read, eval, print loop</em> or REPL for short.</p>

<p>In Rails, the Ruby REPL “irb” is launched and Rails application code is loaded automatically when running the <a href="https://guides.rubyonrails.org/command_line.html">rails console</a> command.</p>

<p>In Django the equivalent command is running <a href="https://docs.djangoproject.com/en/5.1/ref/django-admin/#shell">shell</a>, however application code needs to be imported before it can be used, using <code class="language-plaintext highlighter-rouge">import</code> statements.</p>

<p>Both frameworks also support opening a database client, by running <code class="language-plaintext highlighter-rouge">dbconsole</code> in Rails or <code class="language-plaintext highlighter-rouge">dbshell</code> in Django.</p>

<p>When Postgres is configured, these both open a psql session.</p>

<h2 id="projects-and-apps">Projects and Apps</h2>
<p>In Django projects and applications are separate concepts.</p>

<p>In my experimental project I made a “booksproject” project and a “books” app.</p>

<p>Check out the <a href="https://github.com/andyatkinson/booksproject">booksproject repo</a>.</p>

<h2 id="postgres-details">Postgres details</h2>
<p>The books app models are Author, Publisher, and Books.</p>

<p>The tables for those models are contained in a custom schema <code class="language-plaintext highlighter-rouge">booksapp</code>, and Django is configured to access it.</p>

<p>The application connects to Postgres as the <code class="language-plaintext highlighter-rouge">booksapp</code> user and the dev database is called <code class="language-plaintext highlighter-rouge">books_dev</code>.</p>

<h2 id="no-migration-safety-concept">No migration safety concept</h2>
<p>There’s no concept of what I’d call “safety” for migrations for either framework out of the box.</p>

<p>Operations like adding indexes in Postgres don’t use the concurrently keyword by default for example.</p>

<p>We can add safety using additional libraries like <a href="https://github.com/ankane/strong_migrations">Strong Migrations</a> in Ruby.</p>

<p>At a smaller scale of data and query volume, even unsafe operations will be fine. With that said, I think some visibility into blocking database operations, and how to perform them using safe alternatives is valuable.</p>

<h2 id="adding-a-constraint">Adding a constraint</h2>
<p>In models add <code class="language-plaintext highlighter-rouge">unique=True</code> to a field definition to add a unique constraint (via a unique index). After running <code class="language-plaintext highlighter-rouge">makemigrations</code> a migration for a unique index will be created.</p>

<p>In Active Record we’d generate the migration file first, then fill in the create statement adding a unique index.</p>

<h2 id="django-models">Django models</h2>
<p>When querying a model like Book we’d use <code class="language-plaintext highlighter-rouge">objects</code> which returns a QuerySet object with one or more books.</p>

<p>The <code class="language-plaintext highlighter-rouge">filter()</code> method will generate a SQL query with a <code class="language-plaintext highlighter-rouge">WHERE</code> clause to filter down the rows or all rows can be accessed using <code class="language-plaintext highlighter-rouge">all()</code>.</p>

<p>For example:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Model</span><span class="p">.</span><span class="n">objects</span><span class="p">.</span><span class="nb">filter</span><span class="p">()</span>
<span class="n">Model</span><span class="p">.</span><span class="n">objects</span><span class="p">.</span><span class="n">first</span><span class="p">()</span>
<span class="n">Model</span><span class="p">.</span><span class="n">objects</span><span class="p">.</span><span class="nb">all</span><span class="p">()</span>
</code></pre></div></div>

<p>Statements in Python are whitespace sensitive so we’d indent the attributes in <code class="language-plaintext highlighter-rouge">create()</code> below by 4 spaces:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Thing</span><span class="p">.</span><span class="n">models</span><span class="p">.</span><span class="n">create</span><span class="p">(</span>
    <span class="n">attr1</span><span class="o">=</span><span class="n">val</span><span class="p">,</span>
    <span class="n">attr2</span><span class="o">=</span><span class="n">val</span>
<span class="p">)</span>
</code></pre></div></div>

<h2 id="previewing-ddl">Previewing DDL</h2>
<p>The generated SQL DDL isn’t displayed when running <code class="language-plaintext highlighter-rouge">migrate</code> by default.</p>

<p>Unlike Rails, Django provides a mechanism to preview it.</p>

<p>To do that run the <code class="language-plaintext highlighter-rouge">sqlmigrate</code> command instead of <code class="language-plaintext highlighter-rouge">migrate</code>.</p>

<p>For example, to print the 0001 migration DDL:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python manage.py sqlmigrate books 0001
</code></pre></div></div>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">BEGIN</span><span class="p">;</span>
<span class="c1">--</span>
<span class="c1">-- Create model Author</span>
<span class="c1">--</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="nv">"books_author"</span> <span class="p">(</span><span class="nv">"id"</span> <span class="nb">bigint</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="k">GENERATED</span> <span class="k">BY</span> <span class="k">DEFAULT</span> <span class="k">AS</span> <span class="k">IDENTITY</span><span class="p">,</span> <span class="nv">"first_name"</span> <span class="nb">varchar</span><span class="p">(</span><span class="mi">200</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span> <span class="nv">"last_name"</span> <span class="nb">varchar</span><span class="p">(</span><span class="mi">200</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">);</span>
<span class="k">COMMIT</span><span class="p">;</span>
</code></pre></div></div>

<p>Note that Django uses an identity column for the primary key, and as of Rails 8 Active Record does not.</p>

<h2 id="resources">Resources</h2>
<p>For the basics of an Author, Publisher, and Books models, or Postgres configuration including a custom schema and user, check out <a href="https://github.com/andyatkinson/booksproject">booksproject</a> repo.</p>

<p>To collect random Django tips I’ve created a <a href="/django-tips">django-tips</a> page. This page can be used in a similar way as my <a href="/rails-tips">rails-tips</a> and <a href="/postgresql-tips">postgresql-tips</a> pages, mostly as a reference for myself and possibly as a useful resource for others.</p>

<h2 id="wrap-up">Wrap Up</h2>
<p>Do you have any similarities and differences between Django and Rails to share? I’d love to hear from you.</p>

<p>😅 And no, I’m not “switching” from Rails and Ruby, but I did enjoy working with Python, Django, and Postgres!</p>

<p>Thanks for reading.</p>]]></content><author><name>Andrew Atkinson</name></author><category term="Python" /><category term="Django" /><category term="PostgreSQL" /><summary type="html"><![CDATA[About 10 years ago I wrote a post PostgreSQL for the Busy MySQL Developer, as part of switching from MySQL to Postgres for my personal and professional projects wherever I could.]]></summary></entry><entry><title type="html">Rails World 2024 Conference Recap</title><link href="https://andyatkinson.com/rails-world-2024-conference-recap" rel="alternate" type="text/html" title="Rails World 2024 Conference Recap" /><published>2024-10-17T00:00:00+00:00</published><updated>2024-10-17T00:00:00+00:00</updated><id>https://andyatkinson.com/rails-world-2024-conference-recap</id><content type="html" xml:base="https://andyatkinson.com/rails-world-2024-conference-recap"><![CDATA[<p>This is Part 1 of my recap of Rails World 2024, a phrenetic two-day conference in Toronto, Canada, September 2024, with 1000+ attendees.</p>

<p>In this post, I’ll describe some sessions, but mostly they’re saved for part 2, once I watch all the sessions I missed now that the full <a href="https://www.youtube.com/watch?v=-cEn_83zRFw&amp;list=PLHFP2OPUpCeb182aDN5cKZTuyjn3Tdbqx">Rails World 2024 Playlist is on YouTube</a>.</p>

<p>As a book author and consultant, the focus for Rails World for me was on meeting people, raising awareness about my book, and generally chatting about how people are using Postgres and Rails. As a long-time Ruby community member, it was great to catch up with a lot of industry friends.</p>

<p>As a first time Rails World visitor, I was really impressed with the energy and vibes.</p>

<p>Let’s get into it.</p>

<h2 id="arrival-wednesday">Arrival: Wednesday</h2>

<p>🇨🇦 I Landed in Toronto after a short two hour flight from Minneapolis. I was feeling excited to promote my book, meet attendees, and give away 8 copies over the next few days. I was feeling grateful!</p>

<p><img src="/assets/images/rw2024/rw-1.jpeg" alt="Rails World Conference 2024" />
<small>Landed in Toronto, let’s go!</small></p>

<p>To celebrate my book launch and successfully striking out on my own as an independent Postgres and Rails consultant,
I hosted a happy hour gathering of <a href="https://lu.ma/0f7ly7g5">Postgres Fans</a>. We had a good turnout, conversations, and new and strengthened connections. We didn’t talk about Postgres much, but the event was a success! 🙌</p>

<p><img src="/assets/images/rw2024/rw-2.jpeg" alt="Rails World Conference 2024" />
<small>Party time, Postgres fans Happy Hour at The Queen &amp; Beaver Public House</small></p>

<p>This was my first time hosting an event like this. I was able to sponsor a round of drinks and appetizers thanks to the success in book sales and my consulting business.</p>

<p>It felt great to bring folks together and I appreciated everyone coming out!</p>

<p>Afterwards, most of the group walked over to the Shopify pre-registration event to get badges, hang out, and grab dinner.</p>

<p><img src="/assets/images/rw2024/rw-7.jpeg" alt="Rails World Conference 2024" />
<small>Pre-registration party by Shopify, with John and Jesper</small></p>

<h2 id="conference-day-1-thursday">Conference Day 1: Thursday</h2>

<p>In the kick-off, I appreciated the overview of the Rails Foundation activities, and was excited to see my GitHub handle up on screen briefly along with some friends! In the last year, I made some contributions to the Rails Guides, particularly in the Active Record area. It was a nice surprise to be mentioned!</p>

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Have to zoom way in 🔍 but honored to see my GitHub handle listed as a contributor to Rails Guides, an important effort lead by awesome people like <a href="https://twitter.com/Ridhwana_K?ref_src=twsrc%5Etfw">@Ridhwana_K</a> <a href="https://t.co/LMrcF307ZN">pic.twitter.com/LMrcF307ZN</a></p>&mdash; Andrew Atkinson (@andatki) <a href="https://twitter.com/andatki/status/1839632502723326236?ref_src=twsrc%5Etfw">September 27, 2024</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

<h2 id="session-dhh-keynote">Session: DHH Keynote</h2>

<p>David’s keynote was good. It was a bit of storytelling and individual positions and perspectives, with occasional splashes of tech stuff mixed in. Entertaining! I always appreciate the live demos too.</p>

<p>David announced Rails 8 Beta 1 is now shipping, and even 8.1 was on the horizon. The technical work over the last year made this possible. Earlier this summer, I wrote for AppSignal about new <a href="https://blog.appsignal.com/2024/07/24/whats-coming-in-ruby-on-rails-7-2-database-features-in-active-record.html?utm_source=blogpost&amp;utm_medium=twitter&amp;utm_campaign=2024-07-24">database related functionality in Ruby on Rails 7.2</a>.</p>

<p>I left the keynote with a call to action to be more competent in our roles as software engineers. Although it wasn’t said exactly, I wondered how much of this message was in response to the rise of AI-assisted programming, which has raised the floor perhaps on what’s expected from a modern professional software engineer.</p>

<h2 id="session-postgres-at-instacart">Session: Postgres at Instacart</h2>

<p>I was happy this session made it onto the agenda, given that <a href="https://railsdeveloper.com/survey/2024/#databases">PostgreSQL is the most popular database used by Rails developers</a>, and this was the only Postgres related session at RailsWorld.</p>

<p>Attendance was good. Mostafa took on a challenging task, covering advanced concepts in a short period of time, and did a great job. A highlight for me was meeting Mostafa afterwards. I didn’t know Mostafa was a contributor to PgCat, which was cool to find out. Mostafa talked about different approaches they’d used at Instacart over the years from vertically scaling, to replication and routing read queries to replicas, and forms of database sharding.</p>

<p>Note from me: Remember that Active Record natively supports <a href="https://guides.rubyonrails.org/active_record_multiple_databases.html#activating-automatic-shard-switching">automatic Read and Write query routing</a>, and <a href="https://guides.rubyonrails.org/active_record_multiple_databases.html#horizontal-sharding">Horizontal Sharding</a>.</p>

<p>Mostafa covered the <a href="https://github.com/instacart/makara">Makara gem</a>, and briefly the benefits of the <a href="https://github.com/postgresml/pgcat">PgCat connection pooler</a> over <a href="https://www.pgbouncer.org">pgbouncer</a>.</p>

<p><img src="/assets/images/rw2024/rw-3.jpeg" alt="Rails World Conference 2024" />
<small>Mostafa from Instacart. Scaling Postgres, replication, sharding, PgCat.</small></p>

<h2 id="session-fireside-chat-dhh-matz-tobi">Session: Fireside Chat DHH, Matz, Tobi</h2>

<p>This was interesting to see. I knew Matz didn’t normally attend Ruby on Rails conferences, so it was exciting to see him attending. I should have taken the opportunity to ask for a picture, but did see plenty of others walking around getting pictures with Matz. Great for the community to see the heroes in person!</p>

<h2 id="event-rails-startups-with-evil-martians-and-whop">Event: Rails Startups With Evil Martians and Whop</h2>

<p>Whop and Evil Martians sponsored a social event called Rails Startups. This was a fun event!</p>

<p>I appreciated being invited by Irina and had the chance to meet Jack from <a href="https://whop.com">Whop</a> and a bunch of other Ruby community folks like <a href="https://kirshatrov.com">Kir</a>, Miles, Yaroslov from <a href="https://superails.com">SupeRails</a>, and <a href="https://www.johnnunemaker.com">John</a>.</p>

<h2 id="conference-day-2-friday">Conference Day 2: Friday</h2>

<p>On Day 2, the highlight and emphasis for me was Book Signing, handing out books, and talking Postgres + Rails!</p>

<p><img src="/assets/images/rw2024/rw-14.jpeg" alt="Rails World Conference 2024" />
<small>Book Signing at Rails World 2024</small></p>

<p>Although it was meant to be a 1 hour session from 12pm - 1pm over the lunch period, I was able to meet a lot of people before and after, to generally chat about the book, Postgres, Rails, and hung out for much longer, missing even more sessions (but totally worth it)!</p>

<p>Since Day 2 was filled with meeting people, I’d like to leave you with more pictures. I got more selfies and pictures here than I’d normally do, and I’m really thankful for that. The pictures help me remember all these amazing people in the community, and it’s an honor to have spent a bit of time with them.</p>

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">I got a signed copy of High Performance PostgreSQL for Rails from <a href="https://twitter.com/andatki?ref_src=twsrc%5Etfw">@andatki</a> himself! I had read early versions of the book and can comfortably place this book in the “classics” shelf of my library. This is a must-read! <a href="https://twitter.com/hashtag/RailsWorld?src=hash&amp;ref_src=twsrc%5Etfw">#RailsWorld</a> <a href="https://t.co/zUY35TvEiA">pic.twitter.com/zUY35TvEiA</a></p>&mdash; Emmanuel Hayford (@siaw23) <a href="https://twitter.com/siaw23/status/1841488680759804329?ref_src=twsrc%5Etfw">October 2, 2024</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

<p><img src="/assets/images/rw2024/rw-4.jpeg" alt="Rails World Conference 2024" />
<small>Book giveaway winners, Wojciech, Matt</small></p>

<p><img src="/assets/images/rw2024/rw-5.jpeg" alt="Rails World Conference 2024" />
<small>Book giveaway winners, Allison</small></p>

<p><img src="/assets/images/rw2024/rw-6.jpeg" alt="Rails World Conference 2024" />
<small>Book giveaway winners, Bart</small></p>

<p><img src="/assets/images/rw2024/rw-8.jpeg" alt="Rails World Conference 2024" />
<small>Book winners: Ridhwana</small></p>

<h2 id="more-pics-and-selfies">More Pics and Selfies</h2>

<p><img src="/assets/images/rw2024/rw-9.jpeg" alt="Rails World Conference 2024" />
<small>With Jesper</small></p>

<p><img src="/assets/images/rw2024/rw-10.jpeg" alt="Rails World Conference 2024" />
<small>With Emmanuel</small></p>

<p><img src="/assets/images/rw2024/rw-11.jpeg" alt="Rails World Conference 2024" />
<small>With Dan</small></p>

<p><img src="/assets/images/rw2024/rw-12.jpeg" alt="Rails World Conference 2024" />
<small>With Julia and Jorge</small></p>

<p>While waiting for a car to the airport, I had a serendipitous opportunity to join Craig Kerstiens from <a href="https://www.crunchydata.com">Crunchy Data</a> for a shared ride. Obviously we talked about Postgres!</p>

<p><img src="/assets/images/rw2024/rw-13.jpeg" alt="Rails World Conference 2024" />
<small>With Craig</small></p>

<p>Rails World 2024 was a blast. Thank you to everyone involved in putting this event on. If we met at Rails World, I hope I’ve reached out to you on social media by now, but if not, please drop me a line. Let’s stay in touch.</p>

<h2 id="capping-off-ruby-events-in-2024">Capping off Ruby events in 2024</h2>

<p>I’ve been privileged to attend a lot of Ruby events and be a guest on podcasts this year, to promote my consulting services and my book. Rails World sort of capped it off for me, as I don’t have plans for any more Ruby events in 2024. I’m thinking about 2025 though!</p>

<p>2024 was my first year striking out on my own as an independent consultant. Thank you to all the clients, colleagues, and friends that have helped me make it work. I’ve got a lot to be thankful for, and I’m excited to keep it going in 2025!</p>

<p>I’d like to briefly look back at the year by sharing some links to past podcasts, conferences, and other content. Thanks for taking a look.</p>

<ul>
  <li><a href="/blog/2024/08/13/madison-plus-ruby-conference-recap">Madison+ Ruby 2024 Conference Recap</a></li>
  <li><a href="/blog/2024/07/13/SaaS-on-Rails-on-PostgreSQL-POSETTE-2024-andrew-atkinson">SaaS on Rails on PostgreSQL — POSETTE 2024</a></li>
  <li><a href="/blog/2024/06/10/indierails-podcast-andrew-atkinson-postgres">🎙️ IndieRails Podcast — Andrew Atkinson - The Postgres Specialist</a></li>
  <li><a href="/blog/2024/05/28/top-5-postgresql-surprises-from-rails-developers">Top Five PostgreSQL Surprises from Rails Devs</a></li>
  <li><a href="/blog/2024/05/21/shipit-podcast-changelog-andrew-atkinson">🎙️ Ship It! Podcast — PostgreSQL with Andrew Atkinson</a></li>
  <li><a href="/blog/2024/05/17/railsconf-conference-2024-detroit">RailsConf 2024 Conference — The Long Goodbye</a></li>
  <li><a href="/blog/2024/07/01/mastering-postgresql-phil-smy-andrew-atkinson">Mastering PostgreSQL for Rails: An Interview with Andy Atkinson</a></li>
  <li><a href="/blog/2024/03/25/sin-city-ruby-2024">Sin City Ruby 2024</a></li>
  <li><a href="/blog/2024/03/07/postgresfm-podcast-rails-plus-postgres">Rails + Postgres Postgres.FM 086 — Extended blog post edition! 🎙️</a></li>
  <li><a href="/blog/2024/02/19/maintainable-podcast-robby-russell-andrew-atkinson-maintainable-databases">Maintainable Podcast — Maintainable…Databases? 🎙️</a></li>
  <li><a href="/blog/2024/01/05/Remote-Ruby-unleashing-power-postgresql-andrew-atkinson">Remote Ruby — Unleashing the Power of Postgres with Andrew Atkinson 🎙️</a></li>
  <li><a href="/blog/2024/01/05/Rails-Changelog-Podcast-014-PostgreSQL-Rails-andrew-atkinson">The Rails Changelog — #014: PostgreSQL for Rails Developers with Andrew Atkinson 🎙️</a></li>
</ul>]]></content><author><name>Andrew Atkinson</name></author><category term="Ruby on Rails" /><category term="Ruby" /><category term="PostgreSQL" /><category term="Conferences" /><summary type="html"><![CDATA[This is Part 1 of my recap of Rails World 2024, a phrenetic two-day conference in Toronto, Canada, September 2024, with 1000+ attendees.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://andyatkinson.com/assets/images/rw2024/rw-10.jpeg" /><media:content medium="image" url="https://andyatkinson.com/assets/images/rw2024/rw-10.jpeg" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>