Soft Delete: Why We Never Really Delete
Think about it - a user just deleted their important blog post by accident. They're panicking, and your database has no record it ever existed.
Here's something that will save you countless support calls: never hard delete user-generated content.
The fundamental question: "Do we really need to make this data disappear forever, or just hide it from users?"
What Is Soft Delete
Instead of:
-- Hard delete - gone forever
DELETE FROM posts WHERE id = 123;
Do this:
-- Soft delete - marked as deleted
UPDATE posts SET is_deleted = true, deleted_at = NOW() WHERE id = 123;
The Three Reasons Why
1. Users Make Mistakes
Google Drive gives you 30 days to recover deleted files. Gmail does the same. There's a reason every major platform does this.
With hard delete: "Sorry, it's gone forever."
With soft delete: UPDATE posts SET is_deleted = false WHERE id = 123;
- Hero status achieved.
2. Deleted Data Is Valuable
Users delete content for interesting reasons - misinformation, negative feedback, policy violations. That's signal, not noise. Your ML models benefit from understanding deletion patterns.
3. Legal Requirements
When regulators ask "show us all data for user X between dates Y and Z," you better have it. Even deleted data often has retention requirements.
The Hidden Performance Benefit
Here's what most people don't know: hard deletes are expensive in databases.
Why Hard Deletes Hurt
When you delete a record:
- Find the record
- Delete it
- Check if database page is too empty
- Merge with other pages if needed
- Update parent nodes
- Potentially restructure the entire tree
One delete = multiple expensive operations.
Why Soft Deletes Are Fast
UPDATE posts SET is_deleted = true WHERE id = ?;
- Find record
- Update one field
- Done
No tree rebalancing. No page merges. One operation.
The Clever Trick: Batch Your Hard Deletes
-- During off-peak hours
DELETE FROM posts
WHERE is_deleted = true
AND deleted_at < NOW() - INTERVAL '30 days'
LIMIT 1000;
Now you're paying the rebalancing cost once for 1000 deletions instead of 1000 times.
Simple Implementation
-- Add to existing table
ALTER TABLE posts ADD COLUMN is_deleted BOOLEAN DEFAULT FALSE;
ALTER TABLE posts ADD COLUMN deleted_at TIMESTAMP NULL;
-- Index for performance
CREATE INDEX idx_posts_active ON posts (is_deleted) WHERE is_deleted = FALSE;
Query pattern:
-- Always filter out deleted
SELECT * FROM posts WHERE user_id = ? AND is_deleted = FALSE;
When NOT to Soft Delete
Hard delete these:
- Likes/reactions (user liked then unliked)
- Session data (temporary tokens)
- Analytics events (view counts)
- Cache entries
Soft delete these:
- User posts, comments, uploads
- User accounts
- Financial transactions
Decision rule: "If this came back tomorrow, would anyone care?" If yes, soft delete. If no, hard delete.
Automated Cleanup
-- Nightly cleanup job
DELETE FROM posts
WHERE is_deleted = true
AND deleted_at < NOW() - INTERVAL '30 days'
LIMIT 1000;
Set it and forget it. Best of both worlds - recoverability window plus eventual cleanup.
Key Takeaways
- Default to soft delete for user-generated content
- Batch your hard deletes during off-peak hours
- Always filter
is_deleted = FALSE
in queries - Automate cleanup with scheduled jobs
- Don't soft delete everything - use your judgment
The bottom line: Soft delete acknowledges that users make mistakes, data has value, and databases prefer batch operations. Implement it once, benefit forever.
Your future self (and your users) will thank you.