Posts by Lauren Ellsworth

Software Developer at Flatiron Health, previously of Case Commons, Patch and outside.in. An edges of the stack developer, specializing in JS and CSS, SQL and Ops. A big fan of all things data, as well as dinosaurs and our 16th President.

These Are a Few of My Favorite (psql) Things

I use psql on a daily basis for quick queries, reminding myself of a specific table structure, or analyzing performance, and a dozen other things. These are a few of my psql commands to make the experience more powerful, more legible, or get to what I need faster.


\? help with psql commands
myawesomedb=# \?

First, the shortcut to psql help. This will list all of the commands available. I try to pick out one or two useful ones and focus on applying them for a week or two. This helps me remember them and gradually plug them into my workflows.



\x [on|off|auto] toggle expanded output (currently off)
myawesomedb=# \x

Probably my most used command, expanded output lists columns in rows, instead of typical side by side column output. For tables with a large volume of columns, this can make a result set far more legible.



\timing [on|off] toggle timing of commands (currently off)
myawesomedb=# \timing

Especially handy when benchmarking a query, turning on timing will output the runtime of every command run in your psql console. I use this most when I’m toggling indices, or experimenting with fine-tuning (vacuuming, caching, or otherwise).



\c[onnect] [DBNAME|- USER|- HOST|- PORT|-]
connect to new database (currently "myawesomedb")
myawesomedb=# \c laurens-other-db

Connect to another database. Useful for flipping back and forth between a test database and the development database, for instance. Also saves the extra step of quitting the psql console, and restarting a console session with a new database name.



\l[+] [PATTERN] list databases
\dn[S+] [PATTERN] list schemas
\dt[S+] [PATTERN] list tables
myawesomedb=# \l+
myawesomedb=# \dn+
myawesomedb=# \dt+

List all the things, but also include their size! Note that this is a representation of how much space the table or database is currently taking up, which may be inflated if your database is in need of a vacuum.

There are queries that will use postgres internal tables to report on relation sizes as well, but none are quite so neat as a adding a ‘+’ to any psql list command. In general, ‘+’ adds ‘additional details.’ For databases, the ‘+’ will add size, tablespace, and description. For tables, size and description will be added. And for schemas, access privileges and description are added. The ‘+’ results, though variable by command, are usually a helpful starting point before googling for a specific postgres internals query.

The ‘+’ also generally includes a description column. Numerous things within postgres can be commented upon (see comment documentation for the full list). The comments serve to document the database in place. However, it’s been my experience since these columns aren’t easily visible, they’re rarely checked. Consequently, they usually fall out of sync with the reality of the data underneath.



\q quit psql
myawesomedb=# \q

And that’s it! Enjoy!

Stopping Time In PostgreSQL

Recently, I’ve been working with a database that makes liberal uses of triggers to audit database changes. Each table has a trigger that records the action taken (INSERT, UPDATE, DELETE), the change in value, and the time at which the action occurred. The history recorded effectively paints a picture of database history, thereby requiring much less effort on the part of the application designer to persist data in any cohesive way.

Scanning through some of this historical data with a coworker, we discovered to our surprise that the timestamps were not monotonically increasing. In fact, updates to the same records looked to be confusingly out of order. Abstracting away the actual data, we were looking at a situation like so:

testdb=# SELECT id, old_val, new_val, created_at FROM history;
id | old_val | new_val | created_at
---+---------+---------+----------------------------
1  | a       | b       | 2015-05-11 12:00:00.000000
2  | b       | c       | 2015-05-11 12:00:30.000000
3  | c       | d       | 2015-05-11 12:00:15.000000

Logically, the progression of values still makes sense: a -> b -> c -> d. Verifying the original table, the value is persisted as d, suggesting the most recent update (by id value) is in fact the most accurate representation of the current state of the database. How then, do we generate out of order timestamps?

The Investigation

Transactions in Postgres assure data integrity by locking modified rows until that transaction has been completed. Let’s say I have a users table, and a user named ‘a’. I also have a trigger, which populates my history table, which in turn has a default created_at value of NOW():

testdb=# SELECT * FROM users;
id | name 
---+------
1  | a
(1 row)
testdb=# \d history
             Column             |            Type             |                      Modifiers
--------------------------------+-----------------------------+------------------------------------------------------
 id                             | integer                     | not null default nextval('history_id_seq'::regclass)
 old_val                        | character varying(255)      |
 new_val                        | character varying(255)      |
 created_at                     | timestamp with time zone    | not null default now()

Now, in two different windows, I can open simultaneous transactions, a few seconds apart:

testdb=# # SESSION 1                           | testdb=# # SESSION 2
testdb=# START TRANSACTION;                    | testdb=#
testdb=# SELECT now();                         | testdb=#
              now                              | testdb=#
-------------------------------                | testdb=#
 2015-05-11 20:57:50.255651-04                 | testdb=#
(1 row)                                        | testdb=#
testdb=#                                       | testdb=# START TRANSACTION;
testdb=#                                       | testdb=# SELECT now();
testdb=#                                       |               now  
testdb=#                                       | -------------------------------
testdb=#                                       |  2015-05-11 20:58:00.466234-04
testdb=#                                       |  (1 row) 
testdb=#                                       | testdb=# START TRANSACTION;

Importantly, my 2nd session was started roughly 10 seconds after my 1st. If I wait a few seconds, and check now() again, I see the following:

testdb=# # SESSION 1                           | testdb=# # SESSION 2
testdb=# SELECT now();                         | testdb=# SELECT now();
              now                              |               now  
-------------------------------                | -------------------------------
 2015-05-11 20:57:50.255651-04                 |  2015-05-11 20:58:00.466234-04
(1 row)                                        |  (1 row) 

Once a transaction is open, Postgres fully stops time. Now, in my 2nd session, I’ll update the name of my user ‘a’. Since I’m still in a separate transactions, the name of my user will still appear as ‘a’ in my 1st session.

testdb=# # SESSION 1                           | testdb=# # SESSION 2
testdb=#                                       | testdb=# UPDATE users SET name='b' WHERE id=1;
testdb=#                                       | UPDATE 1
testdb=#                                       | testdb=# SELECT * FROM users;
testdb=#                                       | id | name 
testdb=#                                       | ---+------
testdb=#                                       | 1  | b
testdb=#                                       | (1 row)
testdb=# SELECT * FROM users;                  | testdb=#
id | name                                      | testdb=#
---+------                                     | testdb=#
1  | a                                         | testdb=#
(1 row)                                        | testdb=#

If I then attempt to update my user’s name in my 1st session, the update will hang, as the transaction will not be able to secure a lock on the row in question. While the 2nd transaction remains open, the 2nd session will continue to hold a lock on the updated row to prevent any loss or overwrite or changed yet uncommitted data.

testdb=# # SESSION 1                           | testdb=# # SESSION 2
testdb=# UPDATE users SET name='c' WHERE id=1; | testdb=# 
                                               | testdb=# SELECT * FROM users;
                                               | id | name 
                                               | ---+------
                                               | 1  | b
                                               | (1 row)

Committing my 2nd session, will release the lock to my 1st, allowing the update to complete. However, until my 1st session is committed, reads outside of the transaction will not show the update from b -> c.

testdb=# # SESSION 1                           | testdb=# # SESSION 2
testdb=# UPDATE users SET name='c' WHERE id=1; | testdb=# 
                                               | testdb=# COMMIT TRANSACTION;
UPDATE 1                                       | testdb=#
testdb=#                                       | testdb=# SELECT * FROM users;
testdb=#                                       | id | name 
testdb=#                                       | ---+------
testdb=#                                       | 1  | b
testdb=#                                       | (1 row)
testdb=# SELECT * FROM users;                  | testdb=#
id | name                                      | testdb=#
---+------                                     | testdb=#
1  | c                                         | testdb=#
(1 row)                                        | testdb=#
testdb=# COMMIT TRANSACTION;                   | testdb=# 
testdb=#                                       | testdb=# SELECT * FROM users;
testdb=#                                       | id | name 
testdb=#                                       | ---+------
testdb=#                                       | 1  | c
testdb=#                                       | (1 row)

As soon as each update is performed, the write to the history table is also prepped by the trigger. However, this write also isn’t executed until the transaction closes. Since the 2nd transaction closes first, it posts first, and carries with it the timestamp from the start of the transaction.

The Conclusion

The timeline of our investigation is as follows:

SESSION 1
start: 2015-05-11 20:57:50.255651-04                      close
<--------------------------------------------------------------->

SESSION 2
        start: 2015-05-11 20:58:00.466234-04       close
        <------------------------------------------------>

The end result is a history table in which the created_at timestamps are ‘out of order’:

testdb=# SELECT id, old_val, new_val, created_at FROM history;
id | old_val | new_val | created_at
---+---------+---------+----------------------------
1  | a       | b       | 2015-05-11 20:58:00.466234-04
2  | b       | c       | 2015-05-11 20:57:50.255651-04

Despite the default on the history table’s created_at column, the value of now() at the time the row is prepped is coming from the timestamp frozen at the start of the transaction. The timestamp() is not a reflection of time at the write time, but at the row prep time. The id, in contrast, is acquired at the time the row posts to the history table, and is taken from the next value available in the id sequence.1

The ordering of the table by id, rather than the created_at value, therefore represents the most accurate history of how the data has changed.

* All above code examples are in psql syntax.
1 Because sequence values must be immediately available to all transactions, incrementing a sequence cannot be undone. This prevents locking of a sequence value itself, as well as preventing a collision between a value updated within an open transaction and the state of the sequence after a closed transaction. See sequence documentation and transaction isolation documentation for further information.

Chef: Sensitive


template "all_my_secret_data.erb" do
  source "passwords_and_secrets.erb"
  sensitive: true
  action :create
end

A while back, while working on a Chef Cookbook involving SSL certificates (near or around the time of Heartbleed), I found myself updating sensitive information on our servers. The problem with updating these resources is that much of Chef’s behavior is logged to its own output logs and shown in plain text, right there on the screen. Passwords and certificates don’t really bear their full power when you keep them logged somewhere on your system, and a Chef log can become a treasure trove of user information and certificates in plain text.

Enter the sensitive attribute! Released in an early iteration of Chef 11.12, the sensitive resource attribute will hide the outputs of a file Chef is writing, while still providing sufficient logging to know the file is being written. Use it for anything you don’t want a logged, including crts, passwords, login information, etc.

Read about it in the Chef docs here.
See the source code for it here.

Enjoy your newly secret data!

The first result: Privacy Policy

Searching Google for

“Privacy Policy”

yields a couple of ads about how to make a privacy policy and manage your privacy, followed by Google’s own privacy policy as the top result. Logically this makes sense, but is it fair? Google will be the first to tell you that the first result is the most clicked on, and if you fall below the fold, you can basically forget about it. By surfacing their own policy, they’re influencing what people read about privacy, and what people understand.

So what do people take away from their policy?

“1. Use information to provide our users with valuable products and services.”

Ooph. That doesn’t sound so good. Taken literally, Google can use my information however they want. If they think a valuable product is telling me what my next door neighbor just searched for, that falls within the scope of #1. Reciprocally, they can use my data to help others. Does the fine print of their policy reflect this? Maybe, maybe not. But this is what the average user sees when they look for ‘privacy policy.’ How about #4?

4. Give users meaningful choices to protect their privacy.

Sounds okay to me. But this is Google, master of understanding and influencing my search and click behavior. Shouldn’t this choice link me directly to the tools for doing just that? Instead, I have to look to the left column and click on “Privacy Tools” in the hopes that this link will give me those choices.

In his book , Jonathan Zittrain refers to a fourth party, who provides content to a third party. We’ve heard of third parties – the people who iterate on the internet version of a big box store – Google, Apple, Facebook, Yahoo. Third party providers make great apps and great technology. This is The Fourth Party, a look at data, privacy and code, in an Internet age so great that there are parties to comment upon parties who develop upon parties.