This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite
Lately I have been working a bit with the monitoring platform Zabbix, and the instance in question is backed by the PostgreSQL RDBMS.
SQL and time-series
Apart from data regarding such as hosts, services and checks, a significant amount of the data in the RDMS is historical time-series data aggregated for over a year back in time.
There are several data-stores optimised for time-series data to choose from, as SQL is not the best choice for this, but none of them integrates nicely with Zabbix out of the box.
To keep the database from growing out of proportions, Zabbix comes with a housekeeper functionality that scans through the dataset and removes older data that is past various expiry dates. Doing so will give you tables with many rows that are deleted, and new data will eventually be placed into these vacant rows. Which makes the housekeeper a known source for generating performance challenges, as your time-series tables will be bloated and unsorted on disk.
To amend this issue it is not sufficient to run VACUUM alone, as it will only clean out the dead rows and mark them as reusable. But running VACUUM FULL or CLUSTER which could, alleviate the issue, will take out exclusive locks on your tables. Which is hardly desirable.
Remove bloat and sort
Enter pg_repack. It is an extension to PostgreSQL that let you remove bloat from tables and their indexes, while working online and not taking out exclusive locks. Optionally it can also restore the physical order on disk.
One caveat is though that it requires about double the size of the target table and its indexes, and the target table must have a PRIMARY KEY, or at least a UNIQUE total index on a NOT NULL column.
Installation is normally only an apt or yum command away, followed by creating the extension:
psql -c "CREATE EXTENSION pg_repack" -d bloated_db
Similarly, it can be uninstalled with DROP EXTENSION pg_repack, followed by an appropriate apt or yum command.
The actual re-packing of a table can be initiated like this, for a table with a primary index:
/usr/pgsql-10/bin/pg_repack --table=your_table --jobs=20 bloated_db
Repacking of only the indexes of a table without a primary key:
/usr/pgsql-10/bin/pg_repack -x --table=your_other_table --jobs 20 bloated_db
Quoting the documentation, to perform a full-table repack, pg_repack will:
- Create a log table to record changes made to the original table.
- Add a trigger onto the original table, logging INSERTs, UPDATEs and DELETEs into our log table.
- Create a new table containing all the rows in the old table.
- Build indexes on this new table.
- Apply all changes which have accrued in the log table to the new table. 6p. Swap the tables, including indexes and toast tables, using the system catalogs.
- Drop the original table.
pg_repack will only hold an ACCESS EXCLUSIVE lock for a short period during initial setup (steps 1 and 2 above) and during the final swap-and-drop phase (steps 6 and 7). For the rest of its time, pg_repack only needs to hold an ACCESS SHARE lock on the original table, meaning INSERTs, UPDATEs, and DELETEs may proceed as usual.
Similarly, index only repacks will:
- Create new indexes on the table using CONCURRENTLY matching the definitions of the old indexes.
- Swap out the old for the new indexes in the catalogs.
- Drop the old indexes.
Creating indexes concurrently comes with a few caveats, please see the documentation for details.
Feeling adventurous?
A more modern approach to removing unused space from a table and optionally sort tuples according to a particular index is pg_squeeze. It’s just released for PostgreSQL 9.6 and in beta for 10, and uses transaction log files and logical decoding to capture possible data changes to the table being rebuilt.
I have not had the opportunity to test pg_squeeze yet. So, are you adventurous for Christmas?
Happy hacking!
Thoughts on the CrowdStrike Outage
Unless you’ve been living under a rock, you probably know that last Friday a global crash of computer systems caused by ‘CrowdStrike’ led to widespread chaos and mayhem: flights were cancelled, shops closed their doors, even some hospitals and pharmacies were affected. When things like this happen, I first have a smug feeling “this would never happen at our place”, then I start thinking. Could it?
Broken Software Updates
Our department do take responsibility for keeping quite a lot ... [continue reading]