Introduction
Dataset Characteristics- Year: Transactions from 2016 to 2018
- Total size: 210,000 transactions
- Overall fraud rate: ~0.15%
- Additional challenge: Clients who committed fraud in the training set are different from those in the evaluation set
- transactions_train.csv: Transactions for training.
- train_fraud_labels.json: Labels for the training transactions.
- cards_data.csv: Payment card information.
- users_data.csv: User information.
- mcc_codes.json: MCC codes and descriptions.
- evaluation_features.csv: Transactions for evaluation (without labels). (Note: This file is only to be used for model evaluation; it should not be used for training.)
Participants must submit a CSV file containing the following columns:
- transaction_id: Transaction identifier (as in evaluation_features.csv)
- fraud_prediction: Binary prediction (1 for fraud, 0 for non-fraud)
This dataset presents a realistic challenge in bank fraud detection where models must generalize to new clients. Participants will need to develop algorithms capable of detecting fraud for clients never seen during training.