Skip to main content

postgresql table paritioning

·3 mins

PostgreSQL offers several ways to partition tables, including by range, list, or hash. There are various partitioning strategies available. This post will focus on a practical example using date-based partitioning from a recent project of mine. Demonstrating the concepts and implementation of table partitioning in practice.

The Problem #

Let’s start with a simple table that stores messages:

CREATE TABLE messages (
    id UUID PRIMARY KEY,
    message JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

As this table grows, queries filtering on specific columns might become slower, and managing the data becomes more challenging. Partitioning can help address these concerns.

Implementing Partitioning #

For our example, we’ll partition the table by date using the PARTITION BY RANGE clause:

CREATE TABLE messages (
    id UUID NOT NULL,
    message JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (created_at, id)
) PARTITION BY RANGE (created_at);

Notice two important changes:

  1. The PRIMARY KEY now includes both created_at and id
  2. We’ve added the PARTITION BY RANGE clause

Creating Partitions #

In our example, each partition will represent a single day of data:

CREATE TABLE messages_y2024m12d13 PARTITION OF messages
    FOR VALUES FROM ('2024-12-13 00:00:00') TO ('2024-12-14 00:00:00');

CREATE TABLE messages_y2024m12d14 PARTITION OF messages
    FOR VALUES FROM ('2024-12-14 00:00:00') TO ('2024-12-15 00:00:00');

Working with Partitioned Tables #

When it comes to inserting data you don’t need to worry about inserting into a specific partition. All you need to do is insert into the parent table. Postgres handles the routing of data to the correct partition automatically based on your partition key.

-- Postgres automatically routes this insert to the correct partition
INSERT INTO messages (id, message, created_at)
VALUES (
    'some-uuid',
    '{"key": "value"}'::jsonb,
    '2024-12-13 14:30:00'::timestamptz
);

Automating Partition Management #

In a production environment, you’ll likely want to automate the creation of new partitions. Using Go you can create a function that helps create partitions:

func createPartition(db *sql.DB, date time.Time) error {
    tableName := fmt.Sprintf("messages_y%04dm%02dd%02d",
        date.Year(), date.Month(), date.Day())

    startDate := time.Date(date.Year(), date.Month(), date.Day(),
        0, 0, 0, 0, time.UTC)
    endDate := startDate.AddDate(0, 0, 1)

    query := fmt.Sprintf(`
        CREATE TABLE IF NOT EXISTS %s PARTITION OF messages
        FOR VALUES FROM ('%s') TO ('%s')`,
        tableName, startDate.Format("2006-01-02 15:04:05"),
        endDate.Format("2006-01-02 15:04:05"))

    _, err := db.Exec(query)
    return err
}

You could then automate the creating of partitions for future dates:

func ensurePartitions(db *sql.DB, daysAhead int) error {
    now := time.Now()
    for i := 0; i <= daysAhead; i++ {
        partitionDate := now.AddDate(0, 0, i)
        if err := createPartition(db, partitionDate); err != nil {
            return fmt.Errorf("failed to create partition for %s: %w",
                partitionDate.Format("2006-01-02"), err)
        }
    }
    return nil
}

Benefits of Partitioning #

  1. Queries can quickly find relevant data by skipping partitions that don’t match the search criteria, leading to better performance across large datasets
  2. Drop a partition. That’s all it takes to remove old data
  3. Database maintenance operations like index rebuilds and VACUUM run more efficiently on smaller, individual partitions, reducing the overall maintenance window for your database

Conclusion #

Table partitioning is a powerful feature in Postgres that can help manage large datasets efficiently. By using declarative partitioning and automating partition creation. You can maintain a clean and efficient database design while letting Postgres handle the complexity of routing data to the correct partitions.

Success with table partitioning depends on key practices. Always insert data through the parent table. Automate the creation of new partitions to prevent maintenance overhead becoming unmanageable. Partition keys should be included in the primary key for optimal performance. Plan your partition strategy to align with your specific query patterns. Following these fundamentals, you’ll be well-positioned to leverage the full benefits of Postgres partitioning.