A Google-Scale Messaging Service from basics to more advanced concepts
What is Google Cloud PubSub ?
Google Cloud PubSub is an implementation of Publisher/Subscriber pattern that let us enable Event Driven architecture. If you deal with streaming analytics or data integration pipelines to ingest and distribute data — you should get yourself familiarised with this fully-managed service on Google Cloud Platform.
Popular Google products that use this solution: Ads, Search and Gmail.
Publisher Subscriber Pattern
Publisher Subscriber framework allows the messages to be exchanged between publishers (system that produces an event) and subscribers (applications that want to receive every message being published).
Let’s learn core concept of this architecture, straight from the official Google documentation:
Message: the data that moves through the service.
Topic: a named entity that represents a feed of messages.
Subscription: a named entity that represents an interest in receiving messages on a particular topic.
Publisher (also called a producer): creates messages and sends (publishes) them to the messaging service on a specified topic.
Subscriber (also called a consumer): receives messages on a specified subscription.
Real life example that can be used in order to understand the concept is newsletter subscription. Being consumers, we often want to be notified about new promotions of our favorite stores. In order to be up to date with everything that the store offers — you subscribe to the newsletter and from now on, every time the campaign is created (think about it as an event published), you, as well as other subscribers, will receive an email.
Is PubSub simply a message queue?
Not really.
Subscribers often perform different functions. Think about publishing information every time consumer makes an order on the website.
You could have one PubSub subscriber being responsible for sending an order confirmation and another one that calculates statistics dynamically: for example AOV (Average Order Value). Third subscriber could push order information to the magazine in order to prepare package to be send to the customer. We could come up with many different use cases for one message being published.
Note that publisher is not aware of who are the subscribers, and the subscribers have no clue about who the publisher is.
This style of messaging differs from message queues, where the component that sends the message often knows the destination it is sending to.
Message queues batch messages — PubSub will send the message with no or very little queueing. The aim is to notify subscriber immediately after the message is published.
PubSub Key Concepts
- Globally available — Pub/Sub servers run in all GCP regions around the world making exchange messages between different regions possible
- Fully managed — no need to spin up any cluster or configure any kind of Virtual Machine. You are ready to start as you go. Open the Cloud PubSub Dashboard, create your topic and subscription and Google will do all the hard work for you.
- Scalable — increase in the number of topics, subscriptions, or messages can be handled by increasing the number of instances of running servers
- At least once delivery for every subscription. Notice using words “at least”. Yes, it might happen that your message will be delivered more than once and there are exceptional situation where your message may not be delivered at all. I will explain it later in this article.
- Message retention — represented by the time duration of how long you want PubSub to hold messages after being published and before being acknowledged (subscriber’s confirmation of receiving the message).
The maximum retention period in default PubSub implementation is 7 days. Reaching this threshold will cause in message deletion.
What does the above mean actually ?
If for any reason your subscriber is unavailable or simply can’t process the message due to a bug in the code for example — PubSub will hold the message in a buffer up to chosen retention period with a maximum of 7 days.
Please note that there is a version of PubSub called PubSub Lite that comes with unlimited retention period but the default implementation offers 7 days maximum. - Snapshot — let you save the current state of subscription — it is super useful if you deploying new code and you don’t know how it will behave. Your messages will be saved and you can reprocess them again once bugs are fixed.
- Seeking (rewind your backlog to any point in time or a snapshot, giving the ability to reprocess the messages. Fast forward to discard outdated data.
Push & Pull Subscriptions
PubSub offers two types of subscriptions: push and pull.
In push option, Pub/Sub makes requests to your subscriber application to deliver messages. It could be an endpoint URL ready to receive HTTP POST requests. Your endpoint should be able to acknowledges the message by returning an HTTP success status code. Note that the message will be resent as a consequence of not returning a success response.
It is also possible to automatically acknowledge messages — use the auto-ack flag in order to achieve that. Since acknowledgments means that the message has been received — setting auto-ack flag will result in lost messages in case something went wrong while processing.
The time allowed for a subscriber to acknowledge a message can be specified in the subscription command using the ackDeadline parameter.
Messages can stay in a topic for up to seven days.
PubSub is a slow start system
As the name suggest PubSub will not start delivering the messages with the maximum throughput it can provide. Why? Well, better safe than sorry.
Slow start is part of the strategy used by TCP in conjunction with other algorithms to avoid sending more data than the network is capable of processing.
The maximum allowed number of concurrent requests is controlled by the push window. This window treats every successfully delivered message as a speed booster (the window increases exponentially up to 3,000 times N outstanding messages) and every failure causes speed decrease. The initial window is small: 3 times N (number of publish regions). The push window can grow up to 30,000 times N outstanding messages for subscriptions where subscribers acknowledgement is greater than 99% and average push request latency less than 1 sec. After 3,000 outstanding messages, the window increases linearly to prevent the push endpoint from receiving too many messages.
Pull subscriptions are used when you need to control when messages are retrieved from a topic. Pull subscription gives you a way to tell PubSub : “please, give me N next messages”.
Idempotency
It is important that your subscriber’s processing logic is idempotent. What does that mean? Your logic should always produce the same result when executing multiple times. A trivial example is adding 0. No matter how many times you execute this operation, your result will remain the same. You could keep adding 0 to 5 for many hours — but your number won’t change.
Imagine your code throws an error and you are not able to acknowledge the message. PubSub will keep trying to deliver this message. In this scenario you will be forced to execute your logic multiple time.
Let’s get back to the previous example where we had a topic ready to receive customer orders information. Now you have another subscriber that assigns a 20 $ voucher for orders over 100$. If logic that assign vouchers is not able to see whether given customer already received voucher — this situation could result in your customers having too many vouchers. It could make your customers happy though…
Summary
PubSub is a globally scalable, low-latency message-oriented architecture commonly used with streaming analytics or data integration pipelines to ingest and distribute data.
It offers two subscribers modes: push and pull and not counting edge-case exception guarantees at least once message delivery.
In default implementation the messages can be stored up to 7 days maximum and PubSub Lite offers even unlimitted message retention duration.
Logic representing subscriber action should always be idempotent since the messages may be delivered more than once.