Inca Digital’s Investigation Team is often tasked with collecting hidden data on crypto market participants. Although the blockchain space supplies troves of open data to sift through, trading venue activity often remains a mystery due to unreliable trade data and a lack of transparency from trading venue owners. To fill these data gaps, we leverage a variety of Natural Language Processing (NLP) techniques that can produce reliable datasets based on the digital footprint of crypto users. In the example below, we show how particular exchange users can be identified and geotagged.
Identifying Traders
To underline the importance of such datasets, we take derivatives traders operating on the major derivatives venues and try to show that their geographic locations are far more diverse than what is claimed by the exchange operators and is allowed by local securities regulations.
For this report, we include in our sample some popular derivatives platforms such as Bybit, Bitfinex, FTX, Binance Futures, BitMEX, OKEx, and Huobi Futures. Most of them are providing derivatives trading as their major service along with spot markets.
To start, we needed to identify platform users who are actively engaged in derivatives trading, rather than trading spot or just being curious about the exchange website or its activities. For this, we trained BERT models on a small set of known traders’ Tweets and analyzed unique embeddings that trigger a positive classification. This approach helped us discover particular tweet patterns that are inherent in those Twitter users who are involved in trading on derivatives platforms.
The tweet patterns include PNL (profit and loss) proofs, a specific screenshot that displays a derivatives trade execution, a referral link posting, and tweets mentioning a UID (trader’s unique identifier) along with a support request.


PNL proofs and the associated screenshots are meant to brag, showing a derivatives trader’s successful trades. The BERT model output for derivatives traders allowed us to collect a sample of 2,939 unique Twitter users engaged in derivatives trading on Bybit, FTX, Binance Futures, BitMEX, OKEx, Bitfinex and Huobi Futures.
Geotagging
When dealing with geolocating social network users, we typically employ 3 complementary components of the Inca’s NLP module: metadata, language identification, and named entity recognition (NER).
From our sample of 2,939 unique Twitter users engaged in derivatives trading on FTX, Huobi Futures, Binance Futures, OKEx, Bybit, Bitfinex, and Bitmex, we identified the locations of 2,164 traders globally, and 372 from the United States specifically.
