Tautik Agrahari

Soft Delete: Why We Never Really Delete

Think about it - a user just deleted their important blog post by accident. They're panicking, and your database has no record it ever existed.

Here's something that will save you countless support calls: never hard delete user-generated content.

The fundamental question: "Do we really need to make this data disappear forever, or just hide it from users?"

What Is Soft Delete

Instead of:

-- Hard delete - gone forever
DELETE FROM posts WHERE id = 123;

Do this:

-- Soft delete - marked as deleted
UPDATE posts SET is_deleted = true, deleted_at = NOW() WHERE id = 123;

The Three Reasons Why

1. Users Make Mistakes

Google Drive gives you 30 days to recover deleted files. Gmail does the same. There's a reason every major platform does this.

With hard delete: "Sorry, it's gone forever." With soft delete: UPDATE posts SET is_deleted = false WHERE id = 123; - Hero status achieved.

2. Deleted Data Is Valuable

Users delete content for interesting reasons - misinformation, negative feedback, policy violations. That's signal, not noise. Your ML models benefit from understanding deletion patterns.

When regulators ask "show us all data for user X between dates Y and Z," you better have it. Even deleted data often has retention requirements.

The Hidden Performance Benefit

Here's what most people don't know: hard deletes are expensive in databases.

Why Hard Deletes Hurt

When you delete a record:

  1. Find the record
  2. Delete it
  3. Check if database page is too empty
  4. Merge with other pages if needed
  5. Update parent nodes
  6. Potentially restructure the entire tree

One delete = multiple expensive operations.

Why Soft Deletes Are Fast

UPDATE posts SET is_deleted = true WHERE id = ?;
  1. Find record
  2. Update one field
  3. Done

No tree rebalancing. No page merges. One operation.

The Clever Trick: Batch Your Hard Deletes

-- During off-peak hours
DELETE FROM posts 
WHERE is_deleted = true 
  AND deleted_at < NOW() - INTERVAL '30 days' 
LIMIT 1000;

Now you're paying the rebalancing cost once for 1000 deletions instead of 1000 times.

Simple Implementation

-- Add to existing table
ALTER TABLE posts ADD COLUMN is_deleted BOOLEAN DEFAULT FALSE;
ALTER TABLE posts ADD COLUMN deleted_at TIMESTAMP NULL;

-- Index for performance  
CREATE INDEX idx_posts_active ON posts (is_deleted) WHERE is_deleted = FALSE;

Query pattern:

-- Always filter out deleted
SELECT * FROM posts WHERE user_id = ? AND is_deleted = FALSE;

When NOT to Soft Delete

Hard delete these:

Soft delete these:

Decision rule: "If this came back tomorrow, would anyone care?" If yes, soft delete. If no, hard delete.

Automated Cleanup

-- Nightly cleanup job
DELETE FROM posts 
WHERE is_deleted = true 
  AND deleted_at < NOW() - INTERVAL '30 days'
LIMIT 1000;

Set it and forget it. Best of both worlds - recoverability window plus eventual cleanup.

Key Takeaways

  1. Default to soft delete for user-generated content
  2. Batch your hard deletes during off-peak hours
  3. Always filter is_deleted = FALSE in queries
  4. Automate cleanup with scheduled jobs
  5. Don't soft delete everything - use your judgment

The bottom line: Soft delete acknowledges that users make mistakes, data has value, and databases prefer batch operations. Implement it once, benefit forever.

Your future self (and your users) will thank you.