• Engineering
  • Product
  • For Brands
  • What’s New
  • Music
  • Life at Anghami
No Result
View All Result
  • Engineering
  • Product
  • For Brands
  • What’s New
  • Music
  • Life at Anghami
No Result
View All Result
Anghami
No Result
View All Result

Gong: Anghami’s In-house Processing Framework

Ajinkya Bhat by Ajinkya Bhat
December 20, 2023
in Engineering
Share on FacebookShare on Twitter

In the dynamic field of data engineering, we at Anghami, have undergone a significant evolution in our approach to data processing and storage. This transformation, marked by the transition from Redshift to a Spark-powered framework and the integration of Amazon S3, has opened up a multitude of opportunities. At the heart of this transformative endeavor lies Gong, a versatile ETL pipeline framework. This article dives into the intricate technical aspects of Gong, uncovering its operational intricacies and its pivotal role in Anghami’s Data Engineering and Platforms ecosystem.

A little background first. Anghami’s data engineering journey began with Amazon Redshift, a robust solution that efficiently handled a multitude of ETL and ad-hoc tasks daily. However, as Anghami’s data needs grew, the constraints imposed by Redshift’s quotas became apparent, impeding our ability to surpass predefined limits.

  1. Storage Constraints: Redshift’s storage quotas proved inadequate to accommodate rapidly growing data volume. The inability to seamlessly scale storage capacity hindered the organization’s ability to retain and analyze historical data, a critical aspect of data-driven decision-making.
  2. Computational Bottlenecks: The computational capabilities of Redshift struggled to match the accelerating demands. Limited processing power resulted in extended query execution times, obstructing the generation of real-time data insights and undermining the agility of timely decision-making processes.
  3. Cost-Effectiveness Calculations: The pricing model became increasingly cost-prohibitive. The escalating costs could impact the organization’s financial resources, potentially impeding the pursuit of further data-driven initiatives.

This quest led to a transformative shift: leveraging the power of Spark and the flexibility of Amazon S3. This strategic move marked a pivotal moment in Anghami’s data infrastructure evolution. By embracing Spark’s computational prowess (For ETL), Trino (ad-hoc analytics), and S3’s adaptability, Anghami not only resolved Redshift’s constraints but also positioned itself at the forefront of data processing and storage methodologies. This laid the foundation for enhanced efficiency, scalability, and innovation within Anghami’s evolving data ecosystem.

Proposed solution:

From a technical perspective, Gong functions as a Spark application deployed within the AWS EMR infrastructure, accepting various arguments tailored to specific business logic requirements. The process kicks off by retrieving data in the form of a JDBC (Java Database Connectivity) dataframe. This data retrieval is made possible through a JDBC connection established with any of the supported MySQL, PostgreSQL, or SQL Server relational database management systems (RDBMS). Gong efficiently transfers this data to the designated S3 storage environment. Additionally, it creates a Hive table on top of the S3-stored data. Currently, we have more than 2000+ pipelines running daily reading from 15+ different databases.

Gong’s Role in Anghami’s Ecosystem

Approaching Gong from a product standpoint reveals its power as a tool for business users. It empowers users to seamlessly extract data from various databases, monitor processes via platforms like Slack, email, and workplace alerts, track metrics, and execute business logic within a distributed computing environment. Gong plays a pivotal role in the ingestion layer, acting as a source for the subsequent Transformation layer, ultimately contributing to the Visualization layer.

Detailed Flow and Task Lifecycle

Gong’s Capabilities & Benefits:

1. Zero Code Onboarding

Gong streamlines the onboarding process coupled with Airflow, requiring zero lines of code. Its generic code structure simplifies the incorporation of new jobs by editing environment variables, eliminating the need for code refactoring.

2. Database Compatibility

It seamlessly supports MySQL, Postgres, and SQL Server RDS, ensuring compatibility across various database systems.

3. Load Strategies

Equipped to handle diverse load strategies, Gong accommodates Full Loads, Incremental Loads, and Delta Loads within specified timeframes. It also supports both partitioned and unpartitioned hive tables. 

4. Checkpointing

The checkpointing feature efficiently manages loads, ensuring data integrity and consistency. This feature makes pipelines idempotent, contributing to enhanced reliability.

5. Column Exclusion/Selection

Offers selective data fetching, enabling users to exclude or choose specific columns, rather than retrieving all available columns. This selectivity safeguards sensitive information, eliminates unnecessary data transfer, and reduces RDS I/O usage, optimizing costs.”

6. Race Condition Handling

Gong’s approach to data ingestion involves writing to a new S3 target path format with an incorporated EPOCHVALUE at runtime. This ensures uninterrupted user queries during job execution and enables multiple jobs to run concurrently without the risk of data inconsistencies, enhancing overall data reliability.

7. Tracking Metrics

Every job execution captures a comprehensive set of metrics, including database and table names, record count, transferred data size, load type, and elapsed time, among various other key performance indicators (KPIs) in a MySQL table. This robust metric collection serves multiple purposes, such as ensuring data quality, maintaining data integrity, monitoring Service Level Agreements (SLA), and tracking the efficiency of both sluggish and optimized job executions.

8. Airflow custom dependency operator

We are using the MySQL table mentioned in point #7 as the source to track if a particular table is up to date or not. All the airflow jobs that have the gong tables as the dependency make use of this operator for validation before going ahead to prevent processing inaccurate information

Orchestration from Airflow:

All the Gong tasks are defined as environment variables in Airflow and are dynamically triggered from Airflow workflows.

Captured job metrics:

Each task execution produces a comprehensive array of metrics that includes database and table names, record count, data transfer volume, load type, elapsed time, and other essential performance indicators (KPIs). These metrics are stored in a MySQL table as can be seen in the screenshot. This extensive collection of metrics serves various purposes. 

We also have a dedicated “s3 pruner” job that reads this table and deletes files located in the s3 path present in the “old_s3” column. This way we can also store some definite versions of a table if needed.

Wrapping up, Gong’s capabilities and seamless integration into Anghami’s data ecosystem have revolutionized data ingestion and data management within the organization. As the company continues to grow and its data needs evolve, Gong will play an even more pivotal role. We are constantly exploring new ways to enhance Gong’s capabilities, empowering Anghami to make data-driven decisions that will drive its success in the years to come.

Tags: AWSDataEngineeringProduct
Ajinkya Bhat

Ajinkya Bhat

Passionate about Data Engineering, Scaling Infrastructure. I navigate the intricate realms of analytics with a keen eye for detail. As a self-proclaimed data nerd, I thrive on transforming raw information into actionable insights. Beyond the world of tech, I'm a jetsetter exploring the globe, all while embracing a vegetarian lifestyle.

Related Posts

+OSN تتعاون مع شركة castLabs لتعزيز حماية المحتوى على منصتها الرقمية
Engineering

+OSN تتعاون مع شركة castLabs لتعزيز حماية المحتوى على منصتها الرقمية

أعلنت castLabs، الشركة الرائدة في تكنولوجيا الفيديو الرقمي، عن تعاونها مع +OSN لتقديم تقنية "دي آر إم توداي" لحماية...

by Nour Sawli
September 11, 2024
OSN+ Partners with castLabs to Enhance Content Protection with Cutting-edge Multi-DRM Technology, DRMtoday
Engineering

OSN+ Partners with castLabs to Enhance Content Protection with Cutting-edge Multi-DRM Technology, DRMtoday

OSN+ has partnered with castLabs to implement DRMtoday, a cloud-based digital rights management (DRM) solution aiming to safeguard it's...

by Nour Sawli
September 11, 2024
Anghami Selects Bitmovin’s VOD Encoder to Power New Multimedia Streaming Platform
Engineering

Anghami Selects Bitmovin’s VOD Encoder to Power New Multimedia Streaming Platform

Following its merger with OSN+, Anghami has chosen Bitmovin’s VOD Encoding to encode over 40,000 video files, bringing the...

by Nour Sawli
July 16, 2024
أنغامي تتعاون مع بيتموفين لتعزيز منصة بث الوسائط المتعددة الجديدة
Engineering

أنغامي تتعاون مع بيتموفين لتعزيز منصة بث الوسائط المتعددة الجديدة

بعد اندماجها مع+OSN ، اختارت أنغامي مشفر الفيديو حسب الطلب (VOD) من بيتموفين لترميز أكثر من 40,000 ملف فيديو...

by Nour Sawli
July 16, 2024
Next Post
ألبوم عمرو دياب المرتقب حصرياً على أنغامي، ممهداً الطريق لسلسلة جديدة من الفعاليات الموسيقية

ألبوم عمرو دياب المرتقب حصرياً على أنغامي، ممهداً الطريق لسلسلة جديدة من الفعاليات الموسيقية

  • Anghami Files 2023 Annual Report and Announces 2024 Q1 Results, Highlighting 18% Growth in Subscribers and Significant Margin Improvement

    Anghami Files 2023 Annual Report and Announces 2024 Q1 Results, Highlighting 18% Growth in Subscribers and Significant Margin Improvement

    0 shares
    Share 0 Tweet 0
  • EA SPORTS™ AND ANGHAMI ANNOUNCE FIFA 23 GLOBAL IN GAME VANITY DROP

    0 shares
    Share 0 Tweet 0
  • Anghami and OSN+ Successfully Complete Milestone Transaction, Creating an Entertainment Powerhouse

    0 shares
    Share 0 Tweet 0
  • Hidden Anghami Features

    0 shares
    Share 0 Tweet 0
  • Anghami and Groover team up to support up-and-coming artists worldwide.

    0 shares
    Share 0 Tweet 0

About Anghami . Join Our Team . Go To app

© 2021 Anghami

No Result
View All Result
  • Homepage
  • Engineering
  • Product
  • What’s New
  • For Brands
  • Music
  • Life at Anghami

© 2020 Anghami blog