PostgreSQL | The Fourth Party

I use psql on a daily basis for quick queries, reminding myself of a specific table structure, or analyzing performance, and a dozen other things. These are a few of my psql commands to make the experience more powerful, more legible, or get to what I need faster.

\? help with psql commands myawesomedb=# \?
First, the shortcut to psql help. This will list all of the commands available. I try to pick out one or two useful ones and focus on applying them for a week or two. This helps me remember them and gradually plug them into my workflows.

\x [on|off|auto] toggle expanded output (currently off) myawesomedb=# \x
Probably my most used command, expanded output lists columns in rows, instead of typical side by side column output. For tables with a large volume of columns, this can make a result set far more legible.

\timing [on|off] toggle timing of commands (currently off) myawesomedb=# \timing
Especially handy when benchmarking a query, turning on timing will output the runtime of every command run in your psql console. I use this most when I’m toggling indices, or experimenting with fine-tuning (vacuuming, caching, or otherwise).

\c[onnect] [DBNAME|- USER|- HOST|- PORT|-] connect to new database (currently "myawesomedb") myawesomedb=# \c laurens-other-db
Connect to another database. Useful for flipping back and forth between a test database and the development database, for instance. Also saves the extra step of quitting the psql console, and restarting a console session with a new database name.

\l[+] [PATTERN] list databases \dn[S+] [PATTERN] list schemas \dt[S+] [PATTERN] list tables myawesomedb=# \l+ myawesomedb=# \dn+ myawesomedb=# \dt+
List all the things, but also include their size! Note that this is a representation of how much space the table or database is currently taking up, which may be inflated if your database is in need of a vacuum.

There are queries that will use postgres internal tables to report on relation sizes as well, but none are quite so neat as a adding a ‘+’ to any psql list command. In general, ‘+’ adds ‘additional details.’ For databases, the ‘+’ will add size, tablespace, and description. For tables, size and description will be added. And for schemas, access privileges and description are added. The ‘+’ results, though variable by command, are usually a helpful starting point before googling for a specific postgres internals query.

The ‘+’ also generally includes a description column. Numerous things within postgres can be commented upon (see comment documentation for the full list). The comments serve to document the database in place. However, it’s been my experience since these columns aren’t easily visible, they’re rarely checked. Consequently, they usually fall out of sync with the reality of the data underneath.

\q quit psql myawesomedb=# \q
And that’s it! Enjoy!

Recently, I’ve been working with a database that makes liberal uses of triggers to audit database changes. Each table has a trigger that records the action taken (INSERT, UPDATE, DELETE), the change in value, and the time at which the action occurred.Â The history recorded effectively paints a picture of database history, thereby requiring much less effort on the part of the application designer to persist data in any cohesive way.

Scanning through some of this historical data with a coworker, we discoveredÂ to our surprise that the timestamps were not monotonically increasing. In fact, updates to the same records looked to be confusingly out of order. Abstracting away the actual data, we were looking at a situation like so:

testdb=# SELECT id, old_val, new_val, created_at FROM history;
id | old_valÂ | new_valÂ | created_at
---+---------+---------+----------------------------
1Â  | aÂ   Â  Â  | b Â  Â Â  Â | 2015-05-11Â 12:00:00.000000
2Â  | bÂ  Â   Â  | cÂ  Â  Â  Â |Â 2015-05-11Â 12:00:30.000000
3Â  | cÂ  Â   Â  | d Â Â  Â  Â |Â 2015-05-11Â 12:00:15.000000

Logically, the progression of values still makes sense: a -> b -> c -> d. Verifying the original table, the value is persisted as d, suggesting the most recent update (by id value) is in fact the most accurate representation of the current state of the database. How then, do we generate out of order timestamps?

The Investigation

Transactions in Postgres assure data integrity by locking modified rows until that transaction has been completed. Let’s say I have a users table, and a user named ‘a’. I also have a trigger, which populates my history table, which in turn has a default created_at value of NOW():

testdb=# SELECT * FROM users;
id | name 
---+------
1  | a
(1 row)
testdb=# \d history
             Column             |            Type             |                      Modifiers
--------------------------------+-----------------------------+------------------------------------------------------
 id                             | integer                     | not null default nextval('history_id_seq'::regclass)
 old_val                        | character varying(255)      |
 new_val                        | character varying(255)      |
 created_at                     | timestamp with time zone    | not null default now()

Now, in two different windows, I can open simultaneous transactions, a few seconds apart:

testdb=# # SESSION 1                           | testdb=# # SESSION 2
testdb=# START TRANSACTION;                    | testdb=#
testdb=# SELECT now();                         | testdb=#
              now                              | testdb=#
-------------------------------                | testdb=#
 2015-05-11 20:57:50.255651-04                 | testdb=#
(1 row)                                        | testdb=#
testdb=#                                       | testdb=# START TRANSACTION;
testdb=#                                       | testdb=# SELECT now();
testdb=#                                       |               now  
testdb=#                                       | -------------------------------
testdb=#                                       |  2015-05-11 20:58:00.466234-04
testdb=#                                       |  (1 row) 
testdb=#                                       | testdb=# START TRANSACTION;

Importantly, my 2nd session was started roughly 10 seconds after my 1st. If I wait a few seconds, and check now() again, I see the following:

testdb=# # SESSION 1                           | testdb=# # SESSION 2
testdb=# SELECT now();                         | testdb=# SELECT now();
              now                              |               now  
-------------------------------                | -------------------------------
 2015-05-11 20:57:50.255651-04                 |  2015-05-11 20:58:00.466234-04
(1 row)                                        |  (1 row)

Once a transaction is open, Postgres fully stops time. Now, in my 2nd session, I’ll update the name of my user ‘a’. Since I’m still in a separate transactions, the name of my user will still appear as ‘a’ in my 1st session.

testdb=# # SESSION 1                           | testdb=# # SESSION 2
testdb=#                                       | testdb=# UPDATE users SET name='b' WHERE id=1;
testdb=#                                       | UPDATE 1
testdb=#                                       | testdb=# SELECT * FROM users;
testdb=#                                       | id | name 
testdb=#                                       | ---+------
testdb=#                                       | 1  | b
testdb=#                                       | (1 row)
testdb=# SELECT * FROM users;                  | testdb=#
id | name                                      | testdb=#
---+------                                     | testdb=#
1  | a                                         | testdb=#
(1 row)                                        | testdb=#

If I then attempt to update my user’s name in my 1st session, the update will hang, as the transaction will not be able to secure a lock on the row in question. While the 2nd transaction remains open, the 2nd session will continue to hold a lock on the updated row to prevent any loss or overwrite or changed yet uncommitted data.

testdb=# # SESSION 1                           | testdb=# # SESSION 2
testdb=# UPDATE users SET name='c' WHERE id=1; | testdb=# 
                                               | testdb=# SELECT * FROM users;
                                               | id | name 
                                               | ---+------
                                               | 1  | b
                                               | (1 row)

Committing my 2nd session, will release the lock to my 1st, allowing the update to complete. However, until my 1st session is committed, reads outside of the transaction will not show the update from b -> c.

testdb=# # SESSION 1                           | testdb=# # SESSION 2
testdb=# UPDATE users SET name='c' WHERE id=1; | testdb=# 
                                               | testdb=# COMMIT TRANSACTION;
UPDATE 1                                       | testdb=#
testdb=#                                       | testdb=# SELECT * FROM users;
testdb=#                                       | id | name 
testdb=#                                       | ---+------
testdb=#                                       | 1  | b
testdb=#                                       | (1 row)
testdb=# SELECT * FROM users;                  | testdb=#
id | name                                      | testdb=#
---+------                                     | testdb=#
1  | c                                         | testdb=#
(1 row)                                        | testdb=#
testdb=# COMMIT TRANSACTION;                   | testdb=# 
testdb=#                                       | testdb=# SELECT * FROM users;
testdb=#                                       | id | name 
testdb=#                                       | ---+------
testdb=#                                       | 1  | c
testdb=#                                       | (1 row)

As soon as each update is performed, the write to the history table is also prepped by the trigger. However, this write also isn’t executed until the transaction closes. Since the 2nd transaction closes first, it posts first, and carries with it the timestamp from the start of the transaction.

The Conclusion

The timeline of our investigation is as follows:

SESSION 1
start: 2015-05-11 20:57:50.255651-04                      close
<--------------------------------------------------------------->

SESSION 2
        start: 2015-05-11 20:58:00.466234-04       close
        <------------------------------------------------>

The end result is a history table in which the created_at timestamps are ‘out of order’:

testdb=# SELECT id, old_val, new_val, created_at FROM history;
id | old_valÂ | new_valÂ | created_at
---+---------+---------+----------------------------
1Â  | aÂ  Â   Â  | b Â  Â Â  Â | 2015-05-11 20:58:00.466234-04
2Â  | bÂ  Â   Â  | cÂ  Â  Â  Â |Â 2015-05-11 20:57:50.255651-04

Despite the default on the history table’s created_at column, the value of now() at the time the row is prepped is coming from the timestamp frozen at the start of the transaction. The timestamp() is not a reflection of time at the write time, but at the row prep time. The id, in contrast, is acquired at the time the row posts to the history table, and is taken from the next value available in the id sequence.¹

The ordering of the table by id, rather than the created_at value, therefore represents the most accurate history of how the data has changed.

* All above code examples are in psql syntax.
¹ Because sequence values must be immediately available to all transactions, incrementing a sequence cannot be undone. This prevents locking of a sequence value itself, as well as preventing a collision between a value updated within an open transaction and the state of the sequence after a closed transaction. See sequence documentation and transaction isolation documentation for further information.

The Fourth Party

Data, Privacy, and Code

Category / PostgreSQL

These Are a Few of My Favorite (psql) Things

Stopping Time In PostgreSQL

The Investigation

The Conclusion