When I was building crypto compliance software, a critical question was determining which wallets belong to a centralized exchange (CEX). Some on-chain data vendors have successfully labeled CEX’s wallets, such as Nansen. But how do they do it? I’ll explain how to identify which wallets belong to a centralized exchange.
CEXs constitute the majority of trading volume, and by tracking their collective wallet movements, we gain insights into general market movements and trends, which is why many people are interested in cracking this puzzle of finding CEX’s wallets. Consider Coinbase as an example. Because it is one of the most compliant exchanges due to its comprehensive risk management program, Coinbase has different categories of wallets for various purposes: cold wallets, hot wallets, and depositing wallets. Each class has a fixed pattern. After weeks of research — building algorithms to cluster wallets and trace the flows of funds through them — these are patterns I identified:
- Almost all users have different depositing wallets when they deposit funds to Coinbase. The amount of depositing wallets from Coinbase is in the millions.
- Coinbase transfers balances from these depositing wallets to a hot wallet at some fixed frequency. This makes it easy to spot anomalies and potential fraud.
- Coinbase transfers balance from hot to cold wallets at a much lower frequency.
- Hot wallets are characterized by having a large number of small transfers in and a small number of large transfers out.
- Cold wallets hold a large amount of funds and rarely send transactions.
The flow of funds through different categories of wallets
Based on the traits of different Coinbase wallets, I could pattern-match and identify cold and hot wallets in two directions. The first direction is bottom-up, starting with depositing wallets. I recruited 20 people worldwide to sign up for Coinbase accounts and deposit funds into their depositing wallets. I traced the flow of those funds to identify Coinbase’s hot wallets. I then analyzed the historical transactions of those hot wallets to isolate large transactions that led to cold wallets. I used characteristic features of each wallet category to calculate the confidence scores of a given wallet, Coinbase’s hot/cold wallet.
The second direction is top-down: finding all the hot wallets by tracing transfers from cold wallets. One caveat here is there are different types of hot wallets as well: ones that receive user funds, ones that handle payments, ones that send user funds, etc. In my research, I focused solely on hot wallets that control transfers of user funds. In the end, I successfully gathered over a million depositing wallets.
Of course, there are scenarios this approach needs to account for. If specific hot wallets are used more for operational purposes than customer deposits, then our analysis could be inaccurate. Finally, only Coinbase knows every wallet they own, with 100% accuracy.
Centralized exchange wallet address map by IntoTheBlock