• Engineering
  • Product
  • For Brands
  • What’s New
  • Music
  • Life at Anghami
No Result
View All Result
  • Engineering
  • Product
  • For Brands
  • What’s New
  • Music
  • Life at Anghami
No Result
View All Result
Anghami
No Result
View All Result

Outlier Detection In Music Playlists

Elias El Khoury by Elias El Khoury
May 18, 2018
in Engineering
Share on FacebookShare on Twitter

Statistical analysis of playlist’s mood

Playlists usually combine songs under a single category such as artist’s “Best of”, genre and epoch, activity-oriented (i.e. workout or cooking playlists) or mood-focused such as melancholic or joyful playlists. We will analyse the coherency of the latter through the evaluation of each song’s sentiment score and the use of the median absolute deviation (MAD) to robustly measure the variability of the playlist’s mood.

At Anghami, we focus on delivering the best user experience and quality content for music lovers. On our path towards continuous improvement and as a data-driven company, we have turned to statistics to further enhance our customer’s listening sessions.

Converting words into numbers

We’ve had adjectives assigned for every song in the playlist such as exciting, sensual, depressive, nostalgic, etc… and in order to analyse them, we created a unidimensional projection with values ranging from -1 (most extreme negative) to +1 (most extreme positive).

For this task, we’ve chosen VADER Sentiment Analysis [1], an open-source lexicon and rule-based sentiment analysis tool. Words that aren’t in VADER’s dictionary were replaced with their closest synonyms with the help of www.thesaurus.com.

Standard deviation & why we didn’t use it

The standard deviation (SD) is how much members of a group differ from the mean value of the group. A small SD indicates that the values are tightly located around the mean, while a large SD means the values are spread over a wide range. The following plot illustrates how the standard deviation is affected by the proximity of the values to the mean. Both populations have the same mean, but their values are spread differently.

Example of samples from two populations with the same mean but different standard deviations. Red population has mean 100 and SD 10; blue population has mean 100 and SD 50. Source

A simple method of determining outliers in a set is to find all entries that are two standard deviations away from the mean. Let’s take a look at this set of numbers: [1 1 2 2.2 3 3.5 4.1 9]

The mean is 3.2250 and SD is 2.5828. The last entry (9) is greater than mean+2*SD (8.3905). We have successfully detected the outlier here, but that’s not enough.

Let’s do the same for this set of numbers: [1 1 2 2.2 3 3.5 4.1 19 62]

mean=10.867, std=19.972

The red line is mean+2*SD (50.811), we failed to identify (19) which logically should be considered an outlier. The reason is that the standard deviation, which is based on squared distances from the mean, is greatly influenced by the large deviations of extreme outliers, in our case (62).

Median absolute deviation

The median absolute deviation (MAD) is a robust measure of statistical dispersion and is more resilient to few extreme outliers. It is defined as the median of the absolute deviations from the data’s median, simply MAD = median( abs( Xᵢ – median(X) ).

Let’s compute the robust zScore of each of the previous data points using the MAD:

octave:1> a = [1 1 2 2.2 3 3.5 4.1 19 62];
octave:2> abs(a - median(a)) / mad(a,1)
ans =1.81818    1.81818    0.90909    0.72727    0.00000    0.45455    1.00000   14.54545   53.63636

Using the same cut-off factor of 2, we find that the outliers here are entries 19 and 62.

octave:3> a(find((abs(a - median(a)) / mad(a,1)) > 2))
ans =19 62

It’s not a perfect solution, but it’s much better than the less robust std. dev.


Wrapping up

For every playlist, we translate the song’s mood into a normalized score using VADER and compute its robust zScore in terms of the median absolute deviation. The playlists are ordered using the distance between their maximum and minimum sentiments and the sum of their songs’ zScores; thus, if a playlist contains a negative song among mostly positive ones, it will be surfaced first and the outlier songs highlighted for review.

What’s next?

Now that we’re able to analyse playlists over a single attribute (sentiment), we began looking into methods that efficiently deal with multiple dimensions. Hopefully this will be the topic of another story.


[1] Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.

Tags: Data AnalysisResearch and DevelopmentSoundStatistics
Elias El Khoury

Elias El Khoury

Ingestion & R&D Lead, joined Anghami in 2016

Related Posts

+OSN تتعاون مع شركة castLabs لتعزيز حماية المحتوى على منصتها الرقمية
Engineering

+OSN تتعاون مع شركة castLabs لتعزيز حماية المحتوى على منصتها الرقمية

أعلنت castLabs، الشركة الرائدة في تكنولوجيا الفيديو الرقمي، عن تعاونها مع +OSN لتقديم تقنية "دي آر إم توداي" لحماية...

by Nour Sawli
September 11, 2024
OSN+ Partners with castLabs to Enhance Content Protection with Cutting-edge Multi-DRM Technology, DRMtoday
Engineering

OSN+ Partners with castLabs to Enhance Content Protection with Cutting-edge Multi-DRM Technology, DRMtoday

OSN+ has partnered with castLabs to implement DRMtoday, a cloud-based digital rights management (DRM) solution aiming to safeguard it's...

by Nour Sawli
September 11, 2024
Anghami Selects Bitmovin’s VOD Encoder to Power New Multimedia Streaming Platform
Engineering

Anghami Selects Bitmovin’s VOD Encoder to Power New Multimedia Streaming Platform

Following its merger with OSN+, Anghami has chosen Bitmovin’s VOD Encoding to encode over 40,000 video files, bringing the...

by Nour Sawli
July 16, 2024
أنغامي تتعاون مع بيتموفين لتعزيز منصة بث الوسائط المتعددة الجديدة
Engineering

أنغامي تتعاون مع بيتموفين لتعزيز منصة بث الوسائط المتعددة الجديدة

بعد اندماجها مع+OSN ، اختارت أنغامي مشفر الفيديو حسب الطلب (VOD) من بيتموفين لترميز أكثر من 40,000 ملف فيديو...

by Nour Sawli
July 16, 2024
Next Post
The Power of Audio

The Power of Audio

  • Anghami Files 2023 Annual Report and Announces 2024 Q1 Results, Highlighting 18% Growth in Subscribers and Significant Margin Improvement

    Anghami Files 2023 Annual Report and Announces 2024 Q1 Results, Highlighting 18% Growth in Subscribers and Significant Margin Improvement

    0 shares
    Share 0 Tweet 0
  • EA SPORTS™ AND ANGHAMI ANNOUNCE FIFA 23 GLOBAL IN GAME VANITY DROP

    0 shares
    Share 0 Tweet 0
  • Anghami and OSN+ Successfully Complete Milestone Transaction, Creating an Entertainment Powerhouse

    0 shares
    Share 0 Tweet 0
  • Hidden Anghami Features

    0 shares
    Share 0 Tweet 0
  • Anghami contributes to the first-ever IFPI MENA chart

    0 shares
    Share 0 Tweet 0

About Anghami . Join Our Team . Go To app

© 2021 Anghami

No Result
View All Result
  • Homepage
  • Engineering
  • Product
  • What’s New
  • For Brands
  • Music
  • Life at Anghami

© 2020 Anghami blog