Anomaly Detection in Bitcoin Transactions

05 Jul 2021

Abstract

Bitcoin has become a preferred method for transferring money for illicit economicactivity such as ransomware payments and money laundering. Robust detection of these illicit transactions in the Bitcoin blockchain is crucial for anti-money laundering operations. Major challenges for illicit transaction detection includethe lack of transaction labels, the massive size of the bitcoin network, and the fact the bad actors work to mask their activity and avoid detection. Previous researchers have proposed anomaly detection(unsupervised methods), supervised learning, and active learning to classify illicit transactions. Supervised learning is necessary to classify illicit transactions as illicit transactions do not correlate with anomalies in the dataset. Unsupervised learning methods are useful when there’s scarcity of labels. Active learning is a useful modification of supervised learning for illicit transaction classification as real world bitcoin datasets have very few licit/illicitlabels. We compare the performance of these methods using the Elliptic dataset.

Data Source

We use the Elliptic Data Set which contains 200K Bitcoin transactions, a subset of which are tagged as belonging to either licit or illicit categories. This dataset is a time series graph where Bitcoin transactions are nodes in the network and directed payment flows are edges. Along with licit or illicit tags, the nodes are also tagged with 166 additional node features. 94 of the features are local information about the transaction, such as the time step, transaction fee, and output volume. 72 features are aggregated features which are aggregated information from the nodes one-hop backward and forward. The data is divided evenly into 49 time steps. In each time step, the transactions are connected as a single component. There is an important event occurs at time step 43 which is a sudden closure of a dark market. This causes the number of illicit data to decrease significantly after the shut down.

We use this dataset to evaluate the effectiveness of anomaly detection methods to classify illicit transactions, reproduce supervised learning classification method employed in previous studies, including AL, and evaluate the performance of a modification to the AL method that uses the network graph to aid in selecting datapoints.

Repository structure

├── README.md
├── LICENSE
├── writeup.pdf
├── data
├── images
├── notebooks

How to run the notebook

Resources used

This analysis was prepared using Python 3.8 running in a Jupyter Notebook environment.
Documentation for Python can be found here: https://docs.python.org/3.8/
Documentation for Jupyter Notebook can be found here: http://jupyter-notebook.readthedocs.io/en/latest/

The following Python packages were used and their documentation can be found at the accompanying links:

Detailed report

Report Link

Link to source code in Github

Source code