Why Stablecoin Data Is Harder Than It Looks

Three Myths About Blockchain Data Analytics

Jun 17, 2025

Introduction

Stablecoins are the talk of the town. Everyday there is some breaking news. Last week, Stripe announced they are acquiring Privy, a Wallet as a Service Company and Paypal announced they are natively minting PYUSD on Stellar. It’s impossible to keep up with all of it. With more companies coming into the space, the need to track and access stablecoin data is growing. But from customer calls, people keep coming back to the same four questions:

What are stablecoins being use for?
Who is using stablecoins?
What opportunities exists?
Where in the world are stablecoins being use?

At Artemis I spend every day collecting, organizing and aggregating stablecoin data so I can answer these questions. Today, it’s time to debunk a few myths and get to the bottom of how hard are these questions REALLY to answer.

Myth 1: The data is accessible, open and transparent to everyone.

Accessing blockchain data independently remains prohibitively expensive and technically demanding. Though raw blockchain data has become more accessible over the past five years, significant barriers persist. Leading data providers like Dune, Flipside, Allium, and Goldsky each offer distinct advantages, yet none provides comprehensive coverage across every relevant blockchain.

Reality: Everyone and their mother is launching a blockchain with their own quirks, making data analytics extremely complex.

To fully understand your stablecoin's usage patterns and identify emerging opportunities, you need comprehensive visibility across all relevant chains - not just your current deployments. As your multi-chain strategy evolves and your analytical requirements deepen, the data infrastructure demands scale accordingly.

Take PYUSD as an example: after you integrate with LayerZero's OFT protocol, getting the full picture means you need to understand Ethereum, Solana, LayerZero's bridging logic, plus newer chains like Berachain and Flow. And this challenge only gets harder as your users bridge the token to even more places.

The challenge isn’t just access to the chains you’ve launched on — it’s keeping up with a constantly expanding, uncapped universe of chains. Which brings us to problem number two: architecture fragmentation.

Each blockchain has its own data format and architecture

Think back to the early 2000s, when sending a file to someone didn't guarantee they could open it. A PowerPoint might not open on a Mac. A video file might need a special codec. Everyone used different software, and nothing worked together seamlessly. People wasted time converting files, fixing formatting, or chasing down the right version. Even as a lower school student, I dealt with these problems constantly.

We’re in that era for blockchains — where every major chain speaks a different language. The most-used chains today — Solana, Tron, Ethereum, TON, Stellar, Aptos — all have fundamentally different architectures and represent data in radically different ways.

To understand a simple transfer on Solana, you need to grasp token accounts and owner accounts. On Ethereum, you're dealing with smart contracts, externally owned accounts, and ERC-20 tokens. Aptos and Sui throw you into object-oriented models where assets exist as programmable objects with their own logic. Then there's Stellar and TON — niche architectures with surprisingly high stablecoin usage — that don't fit any of these patterns at all.

Understanding activity across chains means untangling a growing web of unique technical foundations.

To make this concrete, let’s return to the PYUSD example. Until recently, understanding PYUSD meant understanding Ethereum, Solana, and the LayerZero protocol. But now, with its launch on Stellar, you also need to understand Stellar’s architecture — including its new smart contract platform, Soroban. It’s an entirely different model, with its own virtual machine and a fundamentally different way of handling transfers and balances.

You need to be a domain expert just to access and parse the data — long before you can extract any meaningful insight.

Myth 2: The job is done once you get access to the blockchain data, insights can now be made

Here's where things get interesting. Say you've solved the access problem - you have all the blockchain data and you've built datasets that capture balances and transfers across the entire ecosystem. What do you actually have?

A whole lot of noise.

Users are just strings of letters and numbers, and wallet balances are often inaccurate or misleading. Raw blockchain data doesn't give you insights - it gives you a mess that needs serious cleaning before it becomes useful.

Reality: Context and off-chain data are MUST-HAVES for understanding what's happening onchain

Even after doing all the hard work to collect onchain data, you're still flying blind on the key questions: Who is using your stablecoin, and where is it being used? The only thing you can confidently say is, "My stablecoin is being used." But that's not actionable. That doesn't help you understand user behavior, market penetration, or growth opportunities. To get there, you need off-chain context. The real question becomes: what off-chain data do you actually need — and how do you get it?

Application and protocol labels: There's no single source of truth for tagging onchain activity. Flipside, Dune, the Open Label Initiative, block explorers, Arkham — they all offer pieces of the puzzle, each with their own schema and limited coverage. To answer basic questions like "What application is this address using?" or "What kind of usage are we seeing?", you need to unify these fragmented label sources and manually tag important wallet addresses yourself. Without this work, you're stuck with raw transaction data that tells you nothing about actual usage patterns.
Geolocation: This is the holy grail — and probably the question I get asked most: Where are my users? We approach this using timezone heuristics and advanced techniques to infer geographic distribution. More importantly, we work with data partners to acquire proprietary off-chain geo data that helps triangulate which country a wallet most likely operates from.

The reality is that solving this labeling problem requires significant resources and industry relationships. You need partnerships with major L1s and protocols to build comprehensive labeled datasets. Most teams don't have the bandwidth or connections to tackle this manually - which is why so many analytics efforts hit a wall after getting the raw blockchain data. The context layer is where the real work begins.

Myth 3: Blockchain data is straightforward and consistent

Blockchains are more complicated than they appear. While the industry has begun standardizing around particular design patterns for token transfers over the last few years, this wasn't always the case. When bridging first became popular, there were no community standards for tracking cross-chain activity. This created a mess when trying to accurately track balances and transfers - especially for tokens that have been around long enough to predate these standards. You need to understand the specific history and quirks of each chain to get accurate data.

Reality: Blockchain “database schemas” change all the time — you need to be a blockchain historian to have accurate data

It's easy to forget that each of these ecosystems is in flux and constantly changing. Take Solana, which has had major upgrades to both its architecture (how the blockchain works) and token program (how tokens are created and transferred).

Architecture Upgrades: When Solana first launched, the chain didn't store timestamps in long-term storage. This creates a major problem when trying to calculate historical balances over time. Solana fixed this in 2020, but the damage was done: how do you rebuild accurate historical balances without timestamps? (Read more here)
Token Program Upgrades: Last year Solana launched Token Program 2022 to solve fragmentation issues with the original design (read more here), but this means you need to understand the nuances of both the old and new token programs to track fungible tokens accurately.

Building on this point, people constantly hear that blockchains are immutable, public, append-only databases. While that's generally true now, it wasn't always the case early on. Optimism is a perfect example - they didn't just have one genesis event and launch. They actually relaunched entirely a few months later.

The outcome? There's no complete dataset anywhere of all token transfers on the original Optimism chain.

Why does this matter? This missing data is crucial for understanding both current and historical activity of major stablecoins on OP Mainnet, including USDC, USDT, and DAI. Without it, you don't have a complete dataset, and accurate wallet balances become impossible to calculate.

I've written more about these historical data challenges here.

Building accurate datasets requires becoming a blockchain historian. It takes years of work to understand the nuanced evolution of each chain and account for all these historical inconsistencies.

Conclusion

Blockchain data presents unique challenges that don’t exist outside of crypto. Even though crypto data is “open,” extracting anything actionable, surprisingly, requires offchain data, integrations with over a dozen data providers, context that’s spread across “crypto Twitter” and documentation, and a team of 10+ engineers. Otherwise, you're just chasing shadows in a space that moves at the speed of light.

Artemis Big Fundamentals

Discussion about this post