Understanding Serialization in Software Engineering.
So we learn a lot about Object Oriented Programming and how to represent an object and how to store it. Have you ever wondered how these objects are transmitted over the network?
How does the database know to understand that this class represent something or when you send data as the request to some other end point how is it represented while transmitting?
Let us understand this with a real world example.
Imagine you are working on a project that requires you to pre-compute a heavy latency-based operation. For example, you want to know when a customer views a particular item so that you can use that information to show particular products to the customer as recommendations.
So you come up with a design that involves you listening to an asynchronous event in which, when the customer adds something to the cart, you get that information and use that itemID to do computations.
However, there’s a twist — you don’t own the service emitting these events. Instead, another team in your company manages this service, and they send messages via SNS (Simple Notification Service) in the form of JSON strings. You need to convert these JSON strings into Java objects on your end. But first, let’s understand how data is transmitted over the network.
Let us say the data looks like a complex Java object, something like this.
public class Notification {
private String notificaitonId;
private NotificationData notificaitonData
private NotificationMetaData notificationMetaData
//
//
//
}
Now, to transmit this object over the network, this data needs to be transmitted as bytes or bits. This is where a concept called serialisation comes into the picture.
What is serialisation?
Serialisation is the process of converting a data object into a byte stream. Serialisation converts objects in any programming language to 1’s and 0’s that can be understood by any computer hardware, irrespective of the language they are using.
Applying Serialization in Our Use Case
In our scenario, the service generating the SNS messages will serialize the message, converting complex objects like NotificationData and NotificationMetaData into a serialized format before sending the Notification.
Example SNS Notification
Here’s an example of an SNS message that might be sent when a customer adds an item to the cart:
{
"notificationId": "12345",
"notificationData": {
"itemId": "98765",
"timestamp": "2024-08-01T12:34:56Z"
},
"notificationMetaData": {
"customerId": "56789",
"sessionId": "abcd-efgh-ijkl"
}
}
Serializing the Notification
The service generating the SNS messages will serialize the Notification object into a JSON string. Below is a simple example of how this could be implemented in Java:
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
public class NotificationSerializer {
private static ObjectMapper objectMapper = new ObjectMapper();
public String serialize(Notification notification) throws JsonProcessingException {
return objectMapper.writeValueAsString(notification);
}
public Notification deserialize(String notificationJson) throws JsonProcessingException {
return objectMapper.readValue(notificationJson, Notification.class);
}
}
Deserializing the Notification
On the subscriber’s end, they would receive the notification as a JSON string and deserialize it back into a Notification object:
public class NotificationProcessor {
private NotificationSerializer serializer = new NotificationSerializer();
public void processNotification(String notificationJson) {
try {
Notification notification = serializer.deserialize(notificationJson);
// Perform operations with the notification object
} catch (JsonProcessingException e) {
e.printStackTrace();
// Handle the error
}
}
}
When the subscriber receives the notification, it arrives as a serialised string. The subscriber must then convert this string back into a notification object. This process is known as deserialization.
What is deserialization?
Deserialization is the process of converting a byte stream into a specific object in a programming language. Hence, a de-serialised object in Python is different from a de-serialised object in Java. Notice we have deserialisation libraries in major programming languages.
A common error during this process is deserializing objects specific to the team that sent the notification. For example, NotificationMetaData might not deserialise correctly if you don't use their deserialiser, resulting in null fields.
Serialization Data Formats
JSON is commonly used for serialisation whenever the task at hand calls for it. However, you can also use a few others.
JSON has a lot of overhead, but the human readability makes it ideal for me. You can also use Protobufs, YAML, or XML. Those are just some of the data object formats you can use.
What’s the connection between JSON and serialisation?
JSON is a string-format representation of byte data. JSON is encoded in UTF-8, meaning 8 bits per byte as an independent data unit. So while we see human-readable strings, behind the scenes, strings are encoded as bytes in UTF-8.
JSON is just a human-readable front-end format that applies UTF encoding at the back-end. Hence, data is serialised as JSON when transferred over networks.
Conclusion
Serialization becomes essential when you’re putting together your communication pipeline. It’s good to know about this topic to feel confident approaching whatever tool you are using with the proper background knowledge.re...