Retroid Equator Server is the new generation of Retroid data storage, designed for long-term retention of vast amounts of data with both storage space and query execution time in mind.

The primary focus of Equator Server is to address the challenges of telecommunication providers, who need to retain subscribers’ Call Detail Records (CDRs) and IP Detail Records (IPDRs) as dictated by legislation in most countries.

While CDR and IPDR retention solution can be implemented utilizing either traditional relational databases or more specialized data retention solutions like Greenwich Server, the necessity to store index information along with the data significantly increases the storage requirements. In real life scenarios the size of indexes may exceed the size of the data, effectively more than doubling the amount of hard drives required. The Equator Server does not use indexes and enables the user to query efficiently against the retained data by organizing the records in storage and RAM in a special way.

Retroid introduces two storage engines intended to find the balance between search speed and disk space needed for data retention:

  • sharded - records are organized in so-called shards, allowing to efficiently query single copy of data against two keys
  • sstable - records are organized in a structure called SSTable, allowing virtually instant queries, but requiring a separate copy of data to be retained for each key

While using sharded engine makes the warehouse denser, sstable-based storage can satisfy the most stringent query performance requirements. Several other approaches to data warehousing are used in the product as well:

  • Partitioning. Data is divided in partitions, some of which are skipped during the query execution based on min/max values.
  • Efficient data representation. The internal types system is designed to efficiently store a lot of typical real-life objects like timestamps, IP addresses and network ports in binary form.
  • Intelligent compression. All data in Equator is compressed to reduce footprint. One of two algorithms can be used - snappy or gzip. Smart records sorting is used to achieve even better compression ratio.
  • Bloom filters. This probabilistic structures cached in the memory are used to decrease the disk operations.

The three main metrics are most important for large data warehouses: storage density, query performance and data loading speed. Those metrics are compared below for large real-life storages retaining call detail records and IP records.

1. Typical record size in bytes (considering indexes in Greenwich and data copies in Equator). Sizes in plain text and binary representation are also given as a reference. Equator shows much better density for IPDR and comparable density for CDR. The second is not so important, because far less call detail records are usually retained.

Text form Binary form Greenwich Equator
Call (CDR) 340 180 90 120
Transport session (IPDR) 100 45 90 15
HTTP query (IPDR) 220 180 180 60

2. Query execution time in seconds. Equator is dramatically faster for ID queries, because sstable storage engine is used.

Greenwich Equator
ID search for 3 years (CDR) 900 10
ID search for 3 years (IPDR) 3600 30
IP-address/resource search
for 24-hours (IPDR)
60 60

3. Data loading speed in millions of records per hour. Equator significantly beats Greenwich allowing to load tens of billions of records on a server per day.

Greenwich Equator
CDR 100 1000
IPDR 250 2000

Despite Equator having better performance and storage density, Greenwich has several important advantages. Greenwich retains data unchanged, which eliminates the need to keep a separate copy of source files often required by legislation. Moreover, Greenwich supports ANSI SQL-92 and a number of analytical functionality like joins and aggregation functions. Equator utilizes NoSQL approach allowing only simple access by key.

Along with Equator Server Retroid introduces Data Acquisition Bus - a solution to validate, transform and deliver source data to separate servers in Equator or Greenwich clusters. Acquisition Bus is a modern replacement to Retroid CSync.

The Bus can be configured to parse a number of file formats and apply a variety of rules to check that all the data fields are present and correct. Errors and various statistics can be logged to a file or reported through a JMX interface to a number of monitoring services.

The solution allows to reliably deliver files to several servers, applying different transformation rules (e.g. inverse or truncate some fields or make line endings consistent). Several copies of each record can be created, or records can be distributed based on some field value. Those flexible capabilities make it possible to setup proper data validation, pre-processing and distribution virtually for any environment.