Tautik Agrahari

Communication - Polling, Websockets & SSE

Communication

Okay, so I assume you know HTTP request. Of course, you know HTTP request. Of course. image The usual communication is request-response paradigm where your client sends a request, server understands what client is expecting and sends the response. That's the lifecycle of an HTTP request.

Short Polling

Let's discuss the first paradigm - short polling.

Short polling is when you create an EC2 instance and you keep refreshing the page to see what's the current status. That's short polling. Either you refresh or your JavaScript does it, it doesn't matter. But you're periodically polling the status and it's giving you the current status of it.

What's the current status? Whatever the current status in the database, it sends you that. That's short polling. https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/51pm.webp It's like checking your food delivery status. You keep opening the app every few minutes - "Order confirmed", "Being prepared", "Out for delivery", "Being prepared", "Out for delivery", "Delivered!" You are constantly asking for the current status! That's short polling.

Examples:

Disadvantages of Short Polling

Here we can start seeing the disadvantages. The disadvantage is too many HTTP calls! Of course that is a problem. But not always! Most use cases are okay polling for data, you don't need to optimize for everything. Optimize for simplicity! Be very pragmatic and practical with your implementation!

Long Polling

Next up: long polling! Opposite of short polling: long polling!

So in short polling we sent a request and got a current status and response but we used to poll very frequently. Long polling says that you make a request and if there is a change in data then only I'll send you the response. That's long polling.

image

For example, imagine you're waiting for a pizza delivery. With short polling, you keep calling the restaurant every 2 minutes: "Is my pizza ready? Is my pizza ready? Is my pizza ready?" They keep telling you "still cooking, still cooking, still cooking, ready for pickup!"

With long polling, you call once and say "call me back when my pizza is ready." You make one call, and the restaurant only calls you back when there's actually something to tell you - "your pizza is ready!" You wait with the phone line open until they have news. That's long polling.

Short Polling vs Long Polling

Short polling vs long polling:

Example: EC2 provisioning

How Long Polling Works

Think of implementation, this is precisely the code that you would write. Short polling and long polling is not some magic thing that happens, it's how you code, it's not a protocol. It's what you code, how you define your user experience to be.

Imagine if you are doing an EC2 server starting - you register the request, async workers continue to pick up and start processing. You keep firing the request on your UI to see what's the current status? What's the current status? What's the current status? And it becomes active. And that's when the polling stops. That's short polling.

In case of long polling you make a request and one of the ways to implement this is the request, the API server is checking the database. The current status was starting, it turned active. Is it active? Is it active? Is it active? Is it active? So your long polling internally might be implemented as a short polling. That's one of the ways to implement.

Or your long polling might have some reactive database like Redis or some other database that can tell it that status has changed, however you want to implement. But the thing is from an end user perspective how many API calls are going. That's what differentiates a short poll to a long poll.

The idea is: From your client's perspective are you making periodic calls or are you making one call and it responds only when the data has changed.

WebSockets

So WebSockets are essentially bidirectional channels, is the best way to put it. So the whole idea is your HTTP request response paradigm is actually a client-server paradigm. It's actually a request-response paradigm where your client sends a request. It says what it needs to be done. Your server understands that "okay, this is what client is expecting" and it sends the response. And that's the lifecycle of an HTTP request.

But what WebSockets say is this: What if my server can proactively send data to client without client asking for it? Again, opposites. So either your client always asks for data and then only server sends it or server proactively sends the data to the client. That is WebSocket. image Everything is TCP based. So TCP doesn't have a concept of timeout. It's HTTP and other protocols that define timeout - actually HTTP also does not have timeout concept. It's a web server which implements a timeout. So TCP as a protocol does not have a concept of timeout. So connections are always kept open. It's a load balancer which terminates the connection or closes the connection after a certain time period.

WebSocket is just a communication paradigm or a protocol that sits, that is implemented on top of TCP. Once the connection is set up it's kept alive and your server, when the connection is kept open, your server can proactively send data to the client rather than client saying "hey give me the latest message", server can proactively send the message to the client "hey you received this message". That's the whole idea.

WebSocket Applications

Classic examples of WebSocket:

Advantages:

Server-Sent Events (SSE)

What's opposite of bidirectional? Persistent bidirectional communication vs persistent unidirectional communication - so that's when you get server-sent events.

It is literally your persistent connection but here your client establishes a connection with the server and your server only sends - your client will not further send data to the server, it's just that client has set up this connection to get server sent events.

image

For example again stock market ticker - Zerodha uses WebSockets, they don't use SSE but you can build a stock market ticker with server sent events.

SSE Applications

A good example of that is deployment log streaming. Imagine you trigger a deployment on a web interface and the deployment is happening on a server and that server is doing the deployment but the deployment log needs to be streamed to your user. So essentially when a deployment is happening, log is getting written on a log file on the server but you need to see it on your web UI.

This can be implemented with server sent events - client connects and tails the file and keeps sending each line to the client. So deployment logs streaming from your deployment machine to your frontend can be done with server sent events.

Again it can also be done with WebSockets. But then why if everything can be done with WebSockets (I get bidirectional), why do I need server sent events?

Why SSE over WebSockets?

Because server sent events are much lighter. So the protocol of WebSocket is heavy. When I say it is heavy, it means that each message - let's say you want to send a message "hello" from one side to another - the metadata payload is too much because it has to handle bidirectional communication: who sent, who is receiving, what's the payload and what not. That's heavy.

In case of server sent events it is actually four times more efficient. So the idea of server sent event is it literally sends the data that you need to send, no metadata overhead, bare minimum metadata overhead. So that's why it is much more memory efficient but what it gives up on is that your client cannot send anything to the server after you implement this.

Other places where you see server sent events:

For example, you invoke a request saying chat.complete - that classic API. This HTTP connection is what you have, you just set stream=true and as and when the tokens are generated, they keep streaming tokens to you and here there is no further communication from client. You are just a viewer, you are not doing anything, you are just watching it doing its thing. So this is server sent events.

You can also do it with WebSocket but everybody has chosen to use server sent events because it's much more efficient.

How Real-time Apps are Generally Structured

First, a myth that everybody has - a lot of people think that when I have a persistent connection I will get everything on the persistent connection. For example, think of messages. You think of Slack. You think every message I receive will be on WebSocket. Hey bhai who said that? No, it's not like that.

**_Only real-time updates will come over the persistent connection. ** image

Think about it. When you load Slack, when you load your messaging app, what happens?

  1. Load the initial interface
  2. Load initial data - that can happen over REST, no problem there
  3. Subscribe for real-time updates - only the real-time updates that should come to you come over the persistent connection

In case of your deployment log, you load the initial interface, then you get your initial data over REST API, and then you subscribe for real-time updates that come over here. That's how most systems are built.

You don't have to send everything over WebSockets. Super general statement! Most people think all messages are sent the same way. Like that over-generalized statement: "Writes go to Master, Reads go to Replica" - over-generalized statement without any context. Same thing goes over here. Not all messages are sent over WebSockets.

You can do your initial load over REST API and then just subscribe to real-time changes over WebSocket. This is how most systems work:

It doesn't mean that every single thing would come over WebSockets.

How to Choose Between Communication Patterns

Yeah, so I wanted to understand how do we choose one over the other - short polling, long polling, WebSockets? It's all about experience! They're all differing in product experience - how users see your product.

Product Experience Matters

Imagine a chat application. You and me chatting - if it does short polling (let's say short polling is done every 5 seconds). Imagine after 5 seconds I get 100 messages and then nothing, and then 2 messages. Poor user experience!

So that's why you should ideally do it over WebSockets because as and when you get messages, the messages are sent to you. So think of product experience, user experience - this is the buffet served to you, decide what you want to use when.

Trade-offs to Consider

Q: In the long polling example, the connection timeout was 10 seconds. But how does the server check the status - are you querying the database continuously? Won't it put load on the database?

A: That's the trade-off of users making multiple calls vs you making multiple DB calls. If you are not doing it, then users would keep refreshing the page. The number of queries on the database would remain the same - either you do it or users do it, or the JavaScript code does it. You decide the complications you would want to go for.

Again, the core thing is how you want to craft the user experience.

Resource Considerations

Q: In case of WebSocket, more resources on the server are utilized, right?

A: Of course, that would depend. Everything is hardware. Are you okay with spending so much cost on persistent connections because the number of servers would be larger? You would need more servers. All of these factors need to be considered.

Q: In short polling, the network bandwidth would be more utilized than server resources because you are making frequent calls?

A: Yes, but then if it's easier to implement and your users are okay, go for it. There is no one right way to do it.

Decision Framework

So basically use case to use case? Yes!

Key factors to consider:

Remember: There is no but right way to do it. Optimize for simplicity first, then optimize for user experience when you need to.