Codementor Events

Data Serialization Methods

Published Oct 20, 2019Last updated Apr 16, 2020
Data Serialization Methods

When data needs to be transferred or send from one entity to another entity over some medium it does require some methods to accomplish this task. Which requires to form a data in particular format which can be understood by the receiver side. Data transmission from one entity to another is always challenging with respect to choosing the right serialization protocol and methods.

First, we need to understand what is serialization & deserialization

Serialization: is the method of packing the data in particular format

Deserialization: unpacking the data / parsing the data

Data can be presented in different format such as strings, integers, list of integers, structure with a bunch of different data types within it, list of structure, database etc.

There are multiple way to achieve the serialization. Four of them are mentioned here.

  • Direst structure cast to a unsigned char buffer (Binary message method)
  • Protobuf
  • JSON
  • ASN.1

Different methods are explained here:

Direct struct cast (Binary message method)

Using this method user can fill the data in the structure format and send the same struct as binary message as a character buffer and on the receiving side cast received character buffer to same structure format and use it.

For example:


struct interface { 

int x; 

int y; 

char string[100]; 

} 
char buffer[1048]; 

struct interface iface; 

iface.x = 10; 

iface.y = 20; 

strcpy(iface.string, “TestString”); 

send(fd, (void *)&iface, sizeof(iface), 0) 

Pros:

  • Easy method
  • Fast & very less delay in processing as there is no encoding / decoding happens

Cons:

  • Not much flexible
  • Recompilation required if the structure is changed on both side
  • Doesn’t take care of machine endianness & different bit architecture. If both sender and receiver are of different endianness then this method will not work unless different methods have been used to convert to same endianness before sending & change it to host endianness after receive
  • Language limitation not all languages can support this.

Protobuf

Protobuf is google’s protocol for serialization & deserialization. In this user needs to define schema file which should act as an interface for both sender and receiver. Schema will be used to generate source file out of it and these source file (APIs) must be use to serialize or deserialize the data.

This provides protoc compiler which shall be user to generate source file from the schema file.

Method:

Create an schema file interface.proto as the following for the same data mentioned in first method

syntax = "proto3"; 
message interface { 

int32 X = 1; 

int32 y = 2; 

string string = 3; 

} 

Now compile this file to generate source file for encode / decode (serialize / deserialize) the message. Protoc has different options to generate source in any of the following languages

  • C++
  • Java
  • Python
  • C#
  • Go

Command to compile schema file (Here python file for exmaple)

Protoc –I=<Source directory > --python_out=$DST_DIR interface.proto 

After source generation, compile source files along with program. Call respective encode / decode APIs available in a Generated source to encode / decode the message

Pros:

  • Takes care of the endianness, user need not worry about any different endianness problem. protoc generated source code API for encode & decode takes care of this.
  • Supports multiple languages C++, Java, Python, C#

Cons:

  • Recompilation required when any changes has to be done in schema file / message format

JSON

JSON is short form of Javascript object notation. This is a light weight data interchange format. This is one method of representing dynamic data using strings (human readables). We can add object, arrays, individual values in this format. JSON is defined as the TAG:VALUE method.

Example:

If we need to send same interface data as previous it shall be represented in JSON format as the following

{“x” : “10” , “y”=20, “string”=”TestString”} 

Pros

  • This doesn’t need any compilation. User just needs to add values as needed as key:value pair combinations
  • As JSON using string to represent values and tags, different endianness doesn’t affect the behavior of the interface and message communication
  • Human readable format.

Cons:

  • This increases the size if its representation in string takes more string bytes.For example to represent a double value of 2.3333556677 in JSON value it will take 12 characters while in the above two method it only consumes 8 / 4 bytes based on double value storage in the memory
  • No security / safety, all data is visible in the strings

ASN.1

ASN.1 Abstract syntax notation One is a standard interface description language. This is globally accepted language for the communication. This also has same concept as protobuf. Schema needs to be defined for the interface. User needs to generate source code from the schema file and use these source APIs code to encode & decode the message

Example:

Create a sample.asn1 file with as the following

InterfaceProtocol DEFINITIONS ::= BEGIN                                          

                                                                                 

Interface ::= SEQUENCE {                                                         

        x INTEGER,                                                               

        y INTEGER,                                                               

        string UTF8String(SIZE(100))                                             

    }                                                                            

END     

Compile the .asn file using asn1c compiler, asn1c compiler can be installed using apt-get command

apt-get install asn1c  

Or can be built from source

After compilation of .ASN1 file it shall generate source file contaning source file .c and header files .h, use the encode / decode API from the generated source to encode / decode the message

Pros

  • Globally accepted standard
  • Different encoding techniques are available to pack the data efficiently (compress message)
    • BER – Basic Encoding rules
    • PER – Packed encoding rules
    • UPER – Unaligned PER
    • OER – Octect encoding rules

Cons

  • Need recompilation if minor change in schema is required.
  • Encode / decode might introduce delay in the operations
Discover and read more posts from Tanuj
get started