Maintenance

PostgreSQL requires some routine maintenance to function properly. As PostgreSQL makes an update, it does not actually delete the old version of the record. It does this because if multiple processes are accessing the database at the same time, a single process has no way of knowing if any other process is looking at the same record its modified. Therefore, any process/transaction that started looking at the table before the record was modified see the old version, even if the record has already been changed.

This problem is very simple to solve. All that needs to be run is PostgreSQL's VACUUM command. This flags all old tuples to be overwritten the next time they need to be. It's meant to be run in an update/delete heavy application, and the adium stuff isn't very update/delete heavy under normal use, but it's still a good idea to run it periodically. (Once a week or so.)

Running vacuum can be accomplished a few ways:

  • Through psql:
    vacuum;
    vacuum analyze;
  • Through the command line:
    vacuumdb
    vacuumdb -z

These can easily be added to a weekly cron job or automatic maintenance script.

Performance

If you are not getting the kind of performance you want from PostgreSQL, a few things might be at blame. PostgreSQL makes use of indexes to perform faster lookups on tables. These indexes are very similar to indexes in a book, and serve the same purpose. PostgreSQL is very smart (sometimes too smart) about when not to use indexes. If your table isn't big enough, it won't take advantage of them at all.

If you want to see if PostgreSQL is using an index for something, you can construct a similar query in psql:

select * from im.message_v where message_date > 'now'::date;

That query should return very quickly. If you want to see exactly how fast it runs:

explain analyze select * from messages where message_date > CURRENT_DATE;

The output I get is something like this, for a relatively typical day:

Index Scan using adium_msg_date_sender_recipient on messages (cost=0.00..160.60 rows=45 width=97) (actual time=0.11..3.55 rows=223 loops=1)
  Index Cond: (message_date > '2003-09-02 00:00:00'::timestamp without time zone)
  Total runtime: 4.31 msec
 (3 rows)

I've swapped message_v for messages here, because message_v has a nasty query plan, and it's very difficult to learn anything constructive from it.

The important parts of the example above are boldfaced. I've boldfaced the term "Index Scan", because that indicates it is using the index (named adium_msg_date_sender_recipient) to determine where the proper dates are. This is good. Index Scans are much faster than Sequential Scans for large tables.

If you're not seeing an index scan, there are a few reasons that could be the cause:

  • Not enough data
    • if your table is small, there is no reason to use an index scan
  • Not properly analyzed.
    • If the number of rows in the cost and the number of rows in the result are significantly different (a factor of 10 or so), rerun vacuum analyze and try again