WSDM Cup on Cross-Market Recommendation Competition

WSDM Cup on Cross-Market Recommendation Competition


E-commerce companies often operate across markets; for instance Amazon has expanded their operations and sales to 18 markets (i.e. countries) around the globe. The cross-market recommendation concerns the problem of recommending relevant products to users in a target market (e.g., a resource-scarce market) by leveraging data from similar high-resource markets, e.g. using data from the U.S. market to improve recommendations in a target market. The key challenge, however, is that data, such as user interaction data with products (clicks, purchases, reviews), convey certain biases of the individual markets. Therefore, the algorithms trained on a source market are not necessarily effective in a different target market. Despite its significance, small progress has been made in cross-market recommendation, mainly due to a lack of experimental data for the researchers. In this WSDM Cup challenge, we provide user purchase and rating data on various markets with a considerable number of shared item subsets. The goal is to improve individual recommendation systems in these target markets by leveraging data from similar auxiliary markets.

To participate register to Codalab and and join our task.

Note: All team members should enter their details in the Team Registration Form.

Leader Board

Starter kit repository


Problem Definition

Given a global set of items I, we define a market M as the collection of its users U(M) together with their interactions (i.e. ratings) with items from I. Generally, a user can interact with different markets, but for simplicity, we assume that the set of users in each market are mutually disjoint with any other parallel market. In this competition, we have three source markets, s1, s2, and s3, and two target markets, t1 and t2.

The goal is to have the best possible recommender system in terms of nDCG@10 on the target markets t1 and t2. For that, you can use the data on these markets and also get help from the data available from the source markets s1, s2, and s3.

Train, Validation, and Test Split

The source markets are used for training, and therefore they are provided in a single Train split. For the target markets, we leave one interaction of each user out for the Test split, and one other interaction for the Validation split. All the remaining interactions are given as the Train split. As usual, you are given only the Train and Validation splits.


Your submissions should be a zip file, containing two folders t1 and t2. In each of these two target folders, there should be two tab separated files test_pred.tsv (the test scores, i.e. reranked items from test_run.tsv) and valid_pred.tsv (the validation scores, i.e. reranked items from valid_run.tsv) with each line containing three columns as follows:

  userId	itemId	score

where the first column userId is the user unique id, the second column itemId is the item unique id, and the third column score is the score your model assigns to this (user, item) pair. For example, each of your files should look like this:

  VA	E2	1.0 
  VA	FQ	0.9 
  VA	WS	1.1 

For each user (i.e. each unique value for userId), the items are sorted based on their score in a descending order (equal scores are handled randomly) and the top 10 items in the ranked list are used for evaluation.

This is how your zip file should look like:
   ├── t1
       ├── test_pred.tsv     /* scores of test items */
       └── valid_pred.tsv    /* scores of validation items */
   ├── t2
       ├── test_pred.tsv     /* scores of test items */
       └── valid_pred.tsv    /* scores of validation items */

We provide a validation script that can be used to check your zip file before submission. Simply run path/to/

on your final zip file to make sure the structure of your submission is OK. If so you will get the following message:

File structure validation successfully passed


We evaluate the submissions based on their average nDCG@10. As discussed in submission guidelines, the scores of items are sorted for each user and the top 10 items are considered for evaluation. For the total evaluation, we concatenate the users of the target markets (t1/scores.tsv and t2/scores.tsv) and compute the nDCG@10 on the resulting list. The teams are ranked based on this metric.
For information purposes we also report separate nDCG@10 and HR@10 for each target market, too.


The training and validation as well as the test run are provided in the starter kit repository. The data is structured as follows:

   ├── s1
       └── [1.8M]  train_5core.tsv        /* train data */
       └── [19.2M]  train.tsv        /* full train data */
       └── [139K]  valid_qrel.tsv        /* validation positive samples */
       └── [5.6M]  valid_run.tsv        /* list of validation items to be reranked */
   ├── s2
       └── [1.1M]  train_5core.tsv        /* train data */
       └── [2.6M]  train.tsv        /* full train data */
       └── [153K]  valid_qrel.tsv        /* validation positive samples */
       └── [6.2M]  valid_run.tsv        /* list of validation items to be reranked */
   ├── s3
       └── [548K]  train_5core.tsv        /* train data */
       └── [1.2M]  train.tsv        /* full train data */
       └── [71K]  valid_qrel.tsv        /* validation positive samples */
       └── [2.9M]  valid_run.tsv        /* list of validation items to be reranked */
   ├── t1
       ├── [2.3M]  test_run.tsv     /* list of test items to be reranked */
       ├── [1.4M]  train.tsv        /* full train data */
       ├── [457K]  train_5core.tsv        /* train data */
       ├── [58K]  valid_qrel.tsv   /* validation positive samples */
       └── [2.3M]  valid_run.tsv    /* list of validation items to be reranked */
   └── t2
       ├── [4.8M]  test_run.tsv     /* list of test items to be reranked */
       ├── [2.8M]  train.tsv        /* full train data */
       ├── [966K]  train_5core.tsv        /* train data */
       ├── [118K]  valid_qrel.tsv   /* validation positive samples */
       └── [4.8M]  valid_run.tsv    /* list of validation items to be reranked */

Getting started

This repository provides a sample code for training a simple Generalized Matrix Factorization (GMF) model over several markets. We provide loading data from zero to a few source markets to augment the target market data, which can help the recommendation performance in the target market.

We highly recommend following the structure of our sample code for your own model design, as we ask every team to submit their code along with their submission and share the implementation with the organizers. In the case we are not able to reproduce your results, your submission will be removed from our leaderboard. Please reach out to us if you encounter any problem with using this code or any other questions / feedback.


We use conda for our experimentations. You can use environment.yml to create the environment (use conda env create -f environment.yml) or install the below list of requirements on your own environment.

  python 3.7

Train the baseline GMF++ model is the script for training our simple GMF++ model that is taking one target market and zero to a few source markets (separated by dash "-") for augmenting with the target market. We implemented our dataloader such that it loads all the data and samples equally from each market in the training phase. You can use ConcatDataset from to concatenate your torch datasets.

Here is a sample train script using zero source market (only train on the target data):

python --tgt_market t1 --src_markets none --tgt_market_valid DATA/t1/valid_run.tsv --tgt_market_test DATA/t1/test_run.tsv --exp_name toytest --num_epoch 5 --cuda

Here is a sample train script using two source markets:

python --tgt_market t1 --src_markets s1-s2 --tgt_market_valid DATA/t1/valid_run.tsv --tgt_market_test DATA/t1/test_run.tsv --exp_name toytest --num_epoch 5 --cuda

After training your model, the scripts prints the directories of model and index checkpoints as well as the run files for the validation and test data as below. You can load the model for other usage and evaluate the validation run file. See the notebook tutorial.ipynb for a sample code on these.

For example, this the output of the above train script:

Model is trained! and saved at:
--model: checkpoints/t1_s1-s2_toytest.model
--id_bank: checkpoints/t1_s1-s2_toytest.pickle
Run output files:
--validation: valid_t1_s1-s2_toytest.tsv
--test: test_t1_s1-s2_toytest.tsv

You can test your validation performance using the following code:

from utils import read_qrel_file,get_evaluations_final

valid_run_mf = mymodel.predict(tgt_valid_dataloader)
tgt_valid_qrel = read_qrel_file('DATA/t1/valid_qrel.tsv')
task_ov, task_ind = get_evaluations_final(valid_run_mf, tgt_valid_qrel)

You will need to upload the test run output file (.tsv file format) for both target markets to Codalab for our evaluation and leaderboard entry. This output file contains ranked items for each user with their score. Our final evaluation metric is based on nDCG@10 on both target markets.


Silver sponsors:



The XMRec WSDM Cup looks for a wide range of financial supports, ranging from the support of top teams award to computational power. Please check the Call for Sponsorship for more information.

Terms and Conditions

The XMRec dataset is free to download for research purposes. Each team is allowed to submit two runs per day and a maximum of 100 runs in total.

Notice that we ask the top teams to provide the code to their models to run them locally after the end of the competition.

At the end of the challenge, each team is encouraged to open source the source code that was used to generate their final challenge solution under the MIT license. To be eligible for the leaderboard or prizes, winning teams are also required to submit papers describing their method to the WSDM Cup Workshop, and present their work at the workshop. Refer to the "Call for Papers" section on the WSDM Cup 2022 webpage for more details.





For questions and announcements follow us on twitter and join our google groups.

You can also contact us via