Retroid Equator Server is the new generation of Retroid data storage, designed for long-term retention of vast amounts of data with both storage space and query execution time in mind.
The primary focus of Equator Server is to address the challenges of telecommunication providers, who need to retain subscribers’ Call Detail Records (CDRs) and IP Detail Records (IPDRs) as dictated by legislation in most countries.
While CDR and IPDR retention solution can be implemented utilizing either traditional relational databases or more specialized data retention solutions like Greenwich Server, the necessity to store index information along with the data significantly increases the storage requirements. In real life scenarios the size of indexes may exceed the size of the data, effectively more than doubling the amount of hard drives required. The Equator Server does not use indexes and enables the user to query efficiently against the retained data by organizing the records in storage and RAM in a special way.
Retroid introduces two storage engines intended to find the balance between search speed and disk space needed for data retention:
- sharded - records are organized in so-called shards, allowing to efficiently query single copy of data against two keys
- sstable - records are organized in a structure called SSTable, allowing virtually instant queries, but requiring a separate copy of data to be retained for each key
While using sharded engine makes the warehouse denser, sstable-based storage can satisfy the most stringent query performance requirements. Several other approaches to data warehousing are used in the product as well:
- Partitioning. Data is divided in partitions, some of which are skipped during the query execution based on min/max values.
- Efficient data representation. The internal types system is designed to efficiently store a lot of typical real-life objects like timestamps, IP addresses and network ports in binary form.
- Intelligent compression. All data in Equator is compressed to reduce footprint. One of two algorithms can be used - snappy or gzip. Smart records sorting is used to achieve even better compression ratio.
- Bloom filters. This probabilistic structures cached in the memory are used to decrease the disk operations.
The three main metrics are most important for large data warehouses: storage density, query performance and data loading speed. Those metrics are compared below for large real-life storages retaining call detail records and IP records.
1. Typical record size in bytes (considering indexes in Greenwich and data copies in Equator). Sizes in plain text and binary representation are also given as a reference. Equator shows much better density for IPDR and comparable density for CDR. The second is not so important, because far less call detail records are usually retained.
|Text form||Binary form||Greenwich||Equator|
|Transport session (IPDR)||100||45||90||15|
|HTTP query (IPDR)||220||180||180||60|
2. Query execution time in seconds. Equator is dramatically faster for ID queries, because sstable storage engine is used.
|ID search for 3 years (CDR)||900||10|
|ID search for 3 years (IPDR)||3600||30|
|IP-address/resource searchfor 24-hours (IPDR)||60||60|
3. Data loading speed in millions of records per hour. Equator significantly beats Greenwich allowing to load tens of billions of records on a server per day.
Despite Equator having better performance and storage density, Greenwich has several important advantages. Greenwich retains data unchanged, which eliminates the need to keep a separate copy of source files often required by legislation. Moreover, Greenwich supports ANSI SQL-92 and a number of analytical functionality like joins and aggregation functions. Equator utilizes NoSQL approach allowing only simple access by key.
Data Acquisition Bus
Along with Equator Server Retroid introduces Data Acquisition Bus - a solution to validate, transform and deliver source data to separate servers in Equator or Greenwich clusters. Acquisition Bus is a modern replacement to Retroid CSync.
The Bus can be configured to parse a number of file formats and apply a variety of rules to check that all the data fields are present and correct. Errors and various statistics can be logged to a file or reported through a JMX interface to a number of monitoring services.
The solution allows to reliably deliver files to several servers, applying different transformation rules (e.g. inverse or truncate some fields or make line endings consistent). Several copies of each record can be created, or records can be distributed based on some field value. Those flexible capabilities make it possible to setup proper data validation, pre-processing and distribution virtually for any environment.