Introduction

Dataset Characteristics
  • Year: Transactions from 2016 to 2018
  • Total size: 210,000 transactions
  • Overall fraud rate: ~0.15%
  • Additional challenge: Clients who committed fraud in the training set are different from those in the evaluation set
Files for Participants
  • transactions_train.csv: Transactions for training.
  • train_fraud_labels.json: Labels for the training transactions.
  • cards_data.csv: Payment card information.
  • users_data.csv: User information.
  • mcc_codes.json: MCC codes and descriptions.
  • evaluation_features.csv: Transactions for evaluation (without labels). (Note: This file is only to be used for model evaluation; it should not be used for training.)
Submission Format

Participants must submit a CSV file containing the following columns:

  • transaction_id: Transaction identifier (as in evaluation_features.csv)
  • fraud_prediction: Binary prediction (1 for fraud, 0 for non-fraud)
Important Note

This dataset presents a realistic challenge in bank fraud detection where models must generalize to new clients. Participants will need to develop algorithms capable of detecting fraud for clients never seen during training.

Contact Us & Support Channels