Zalo AI Challenge: Problems and Solutions

About

Zalo AI Challenge poses a unique problem for participants to show their abilities: building algorithms to detect the gender/category/location of any given voice/song/photo

Challenge 1: Music Genre Classification

Description

Music Genre classification is a difficult and interesting challenge. A good classification is very helpful for smart music storage, music recommendation, music search. Despite of their usefulness, there are not many good music classifiers yet, especially for Vietnamese songs.

In this challenge, you are to build a classifier to detect the correct genre of a Vietnamese song. The 10 selected genres are: Cải Lương, Nhạc Cách Mạng, Nhạc Dân Ca – Quê Hương, Nhạc Dance, Nhạc Không Lời, Nhạc Thiếu Nhi, Nhạc Trịnh, Nhạc Trữ Tình, Rap Việt, Rock Việt.

A training set with labels is provided for your training. A test set with no category labels is also provided to test your trained classifiers against unseen data.

For each song, the classifier will need to output the most matching genre. Teams are scored and ranked by the classification accuracy on the test set.

Evaluation Metric

Accuracy (percentage of correct prediction for all genres) is used to rank competing submissions.

Data

The data provided consists of two archives of audio files (MP3 format) and csv files with metadata.

  • train.zip and test.zip are the audio files composing the train dataset and the test dataset. (about 5000 tracks for train and 2000 tracks for test). Genre information is given for train set but not for test set. One unique genre has been indicated for each track. The objective of the challenge is to retrieve the genre indicated for the audio files of the test set.
  • train.csv: is the csv file which indicates the genre (genre_id) of each track of the train set.
  • test.csv: is the csv file which gives all the track_id of the tracks whose genre has to be estimated.
  • genres.csv is a file which indicates the possible genre and the corresponding id.
  • sample_submission.csv is an example of submission file which can be evaluated according to category accuracy.

For example, file 1001684131607489553.mp3 is an audio file from the train set. Its genre is indicated in train.csv. Genre_id 8 is associated to the track_id 1001684131607489553. In genres.csv, it is indicated that the genre_id 8 corresponds to “Nhạc Trữ Tình”.

Please download the data sets from the following links:

Metadata files:

Audio Files:

Private test: https://dl.challenge.zalo.ai/music/private.zip

Sample submission: https://dl.challenge.zalo.ai/music/sample_submission.csv

Solution

  1. https://github.com/dungnb1333/music_genre_classification

Challenge 2: Landmark Identification

Description

The goal of this challenge is to identify the Vietnam famous landscape depicted in a photograph. The data for this task comes from the Zalo Places dataset which contains 1M+ images belonging to 500+ unique Vietnam famous landscape places.

Specifically, the challenge data will be divided into ~88K images for training & validating, ~30K images for testing coming from 103 places (103 categories). Note that there is a non-uniform distribution of images per category for training, ranging from 100 to 1,000 mimicking a more natural frequency of occurrence of the scene.

For each image, algorithms will produce a list of at most 3 scene categories in descending order of confidence. The quality of a labeling will be evaluated based on the label that best matches the ground truth label for the image. The idea is to allow an algorithm to identify multiple scene categories in an image given that many environments have multi-labels (e.g. a rice terrace can also be at Ha Giang or at Mu Cang Chai).

Evaluation Metric

We follow a similar metric to the classification tasks of the ILSVRC. For each image i, an algorithm will produce 3 labels li,j , j = 1..3 . For this competition each image has one ground truth label gi, and the error for that image is:

1

The overall error score for an algorithm is the average error over all N test images:

2

Data

Please download the data sets from the following links:

Metadata files:

Private test: https://dl.challenge.zalo.ai/landmark/finalPrivateTest.zip  

Sample submissionhttps://dl.challenge.zalo.ai/landmark/sample_submission.csv

Solution

  1. https://github.com/hainamnguyen/ZaloAiLandMark
  2. https://gitlab.com/ngxbac/ZL
  3. https://github.com/tiepvupsu/zalo_landmark
  4. https://github.com/tqtg/zalo-landmark

Challenge 3: Voice Gender/Accent Classification

Description

Identifying gender and regional accent from speech is essential for intelligent systems such as conversational chatbot, recommendation systems, smart home, and speech recognition. In this speech challenge, you will build a system to predict genders and regional accents of Vietnamese speakers using a diverse speech dataset. The dataset consists of ~30K short speech signals recorded in an un-controlled environment.

Evaluation Metric

    • Accuracy (percentage of correct prediction) is used to rank competing submissions.
    • For each speech sample, your model predicts two main labels: gender and accent.

Gender: Female: 0, Male: 1
Accent: North: 0, Central: 1, South: 2

Data

Please download the data sets from the following links:

Metadata files:

 

Private test: https://dl.challenge.zalo.ai/voice/private_test_data.zip

Sample submissionhttps://dl.challenge.zalo.ai/voice/private_test_example.csv

Solution

  1. https://github.com/tiepvupsu/zalo_voice
  2. https://github.com/pbcquoc/voice_zaloai