Tautik Agrahari

Understanding Proxies: Forward vs Reverse

Why should we care about proxies? Because they're everywhere in production systems, and misunderstanding them can lead to serious architectural problems. Think about it - every time you browse the web from a corporate network, access a load-balanced application, or query a distributed database, you're dealing with proxies whether you realize it or not.

What Is a Proxy?

A proxy is a machine or a set of machines that sits in-between two systems - acting as an intermediary between a user and a server. Think of it as a middleman that can do various things to help both sides.

Proxy diagram

Proxy servers are installed to abstract out the complexities of untrusted environments. There are two main types, and understanding the difference is crucial for any system design.

Forward Proxy: Protecting and Controlling Clients

Forward proxy abstracts out the client by acting as a middleman. It sits between your clients and the external world, hiding your clients from the servers they're trying to reach.

Forward proxy diagram

Why do we need forward proxy? There are three compelling reasons:

1. Security: Protecting Client Identity

The external network sees only the proxy's IP address. This protects the client's identity from the external world. When multiple users in an organization access external services, those services only see the proxy's IP, not individual client machines.

This can have interesting implications - if an external service implements IP-based rate limiting and blocks the proxy IP, it affects everyone in the organization, not just the user who triggered the rate limit.

2. Policies: Restricting Access to Certain Websites

Organizations use forward proxies to restrict access to certain websites or tools. Educational institutions and corporates do this all the time. For example, social media sites or file-sharing platforms might be blocked through the proxy.

Even at the national level, this works the same way. In India, certain websites are blocked - for example, TikTok is blocked. The nation's firewall (ISP's firewall) through which requests are routed has rules configured so that any request going to TikTok.com gets blocked.

Organizations implement similar controls. A company might block certain websites and require employees to submit a request to access them. The IT admin reviews the website, determines if it's appropriate for business use, and then whitelists it if approved.

3. Caching: Storing Frequently Accessed Content

Because everything is going through this common proxy, you can cache frequently accessed content on the proxy itself.

A practical example: imagine an educational institution where students frequently access Python documentation for programming assignments. The Python docs can be cached on the proxy server so that even if the internet connection goes down, students would still be able to access the documentation, and it loads very quickly because it's served locally from the proxy cache rather than fetching it from the internet each time.

Reverse Proxy: Abstracting Server Complexity

Reverse proxy abstracts out the complexity of downstream systems. Users connect to a reverse proxy, and it's the reverse proxy's responsibility to route the request to the corresponding backend services.

Reverse proxy diagram

Why do we need reverse proxy? Here are the four critical reasons:

1. Load Balancing Across API Servers or DB Servers

One of the most common examples of reverse proxy is a load balancer. You connect to the load balancer, and depending on the load balancing algorithm, it chooses to forward the request to one of the servers. We've abstracted out the complexities of how many servers we have and what they do - that logic is handled by the reverse proxy.

2. Routing Incoming Requests to Appropriate Services

This is like an API gateway where requests can be routed based on the path. For example:

In the reverse proxy, you configure your routing logic which helps you route the request to the corresponding set of machines.

3. Caching Static Responses

Because everything is going through the reverse proxy, I can cache some of the static responses, allowing me to not fire the request to the origin server. For example, if a blog post is very popular, I can cache the content of the blog on the load balancer itself. Next time a request for that blog comes in, I can directly respond back to users without needing to go to the API server and get the response.

This helps me save the bandwidth of my API server and the CPU and memory of it.

4. Abstraction of Infrastructure Elasticity

The best part is that because we're hiding and abstracting out the complexity of the infrastructure, we don't know if there are five servers or ten servers or fifteen servers behind the reverse proxy. For us, the reverse proxy becomes a single point of entry.

I will always make a call to the load balancer's domain name, and in turn it will make a call to whichever server it wants to. It abstracts out those complexities for me, making this infrastructure elastic - I can add ten more servers or reduce five servers, and my users remain unaffected because they don't care about the backend complexity.

The Key Difference: Direction of Abstraction

Here's the crucial insight: in forward proxy, we're abstracting out the clients from the downstream systems. The box is placed on the client side - we're protecting and controlling the clients.

In reverse proxy, it's the complete opposite - we're abstracting out the complexities of the downstream systems. We're hiding the server infrastructure from the clients.

Real-World Examples and Tools

Forward Proxy Examples:

Reverse Proxy Examples:

Database Proxies: The Specialized Reverse Proxy

Let's spend some time talking about database proxies because they're incredibly powerful. ProxySQL is a great tool that acts as a database proxy, very similar to a load balancer but slightly more advanced.

Database proxy diagram

What does a database proxy do? It accepts a request like a normal SQL query from the client and abstracts out the complexities of the database topology. Behind the scenes, your database can be sharded or partitioned across multiple servers - the client doesn't care. This logic is configured on the database proxy side.

Database proxy benefits:

1. Query Caching - Common SQL queries can be cached on ProxySQL so that the proxy doesn't need to forward requests to the database for subsequent requests. Only the first one goes to the database while others are served from cache. This reduces the load on the database significantly.

2. Connection Pooling - It accepts a lot of connections from clients but makes limited connections with the database, leveraging and utilizing the database connections to their best ability. This is huge for database performance.

3. Topology Abstraction - We don't know how the data is distributed, who owns which data, and the entire topology like how many servers there are. That's all abstracted out. If tomorrow we scale from three servers to five servers, our users don't need to know - they have a single point of contact to reach out to get their queries answered.

ProxySQL is a concrete example - every database has its own sort of database proxy implementation.

Key Takeaways

Understanding proxies is crucial for system design because they solve fundamental problems around security, performance, and architecture abstraction. Whether you're dealing with forward proxies controlling client access or reverse proxies managing server complexity, these patterns show up everywhere in production systems.

The real-world implications are significant - from rate limiting affecting entire organizations to architecting scalable database layers that can handle thousands of concurrent connections. These aren't just theoretical concepts; they're the building blocks of how the internet actually works.

Next time you're designing a system, think about where proxies fit in. Are you trying to control and protect clients? Forward proxy. Are you trying to hide server complexity and enable scaling? Reverse proxy. The direction of abstraction tells you everything you need to know.