NFT data mining on Rarible

Recently, I’ve been exploring the ethereum eco-system and investigating different projects. There is a lot of nice visualizations, buzzwords, and up-only charts on each project website, but you have to see through all that to find what is their real value. For the rest of this post, I will assume that you know about ethereum, NFTs, IPFS, and have a little technical knowledge.

The nice thing about blockchain projects is that almost everything is public, so you have a lot of data to work with and there is no need to do manual web scraping thanks to the Ethereum APIs. The hot trend these days is NFTs, so I chose an NFT marketplace and started to dig into their contracts to see what kind of data I can use. The largest marketplace right now is OpenSea but their contracts are not easy to work with, they mint NFTs lazily to be more efficient (they mint it on-chain only when there is a purchase) and they mostly act as a gateway that aggregates NFTs from all other places. Therefore, I chose Rarible, another big marketplace where anyone can join and mint stuff without verification. Their contract is easy to follow and there is a reasonable amount of data to work with. I also joined their discord to see what’s happening, there are a lot of posts about account verification, NFT promotion, scammers, etc.

Part 1: Duplicate NFTs

One thing that a lot of people were complaining about in Rarible’s discord was random accounts stealing their artwork and apparently making money with it. Rarible NFT contents are uploaded on an IPFS node. The good thing about IPFS addresses is their uniqueness with respect to content. If you upload the same file twice, it will have the same address. So, I checked if there were NFTs with the exact same content address but different creators. It turned out that this is possible in Rarible. In some cases, it seems that the creator has multiple accounts like this: Copy1, Copy2
In other cases, NFTs are minted by two separate creators. Most of the times the second creator is a fake one in hope to sell the fake NFT like this one: Copy1, Copy2. In total, I found around 50 duplicates of this kind. These duplicate NFTs were both minted on the Rarible contract, I can safely say that there would be a lot more duplicates if I included NFTs from other sources.

In general, it’s not a good strategy to prohibit users to upload the exact same content, because users can circumvent it by just changing a single pixel somewhere in the image. But I think it is valid to notify users about it. Rarible can show a warning sign that tells the user this NFT is super similar to another one and make sure to verify it before buying it. Of course, this is the most naive approach, one can add some machine learning methods to do more sophisticated duplicate detection.

Part 2: Wash Trading

Rarible distributes RARI tokens to people who buy/sell on the platform. This incentivizes users to create fake purchases in order to gain tokens. For example, I create accounts A and B. A mints an NFT and sells it to B. B sells it again to A and …. This increases my score in the system without having any actual value and is called wash trading. I saw people complaining about wash trading in Rarible discord channels as well. So, I decided to see how hard it is to detect wash trading automatically, or at least semi-automatically. First, I got all the token Transfers from the contract and created a transfer graph between accounts where each edge shows a single NFT transfer from one account to another. Next, I thought about what a simple wash trading scheme would look like. Basically, you need a few users with a lot of transfers between them. And that’s what I looked for in the graph. I looked for weakly connected components with a high edge density and I easily found a lot of wash trading incidents. Here is a simple one between two users:

This was pretty easy to detect and also not very common. But there were more complex wash tradings between multiple users as well. Here is an example where 9 users transfer 3 tokens between them. Transfers edges are colored based on the token:

And here are the 9 accounts without any information which is also very suspicious: 1,2,3,4,5,6,7,8,9
Given the fact that all of these accounts were still available at the time of writing this post, I assume that they have not been detected yet which is not a very good sign for Rarible. I did all of these explorations in 2 days, so I don’t think it’s too hard to come up with simple systems to improve the platform. One could run such an algorithm once a day and manually check the most suspicious cases. In general, Rarible should invest more in fraud detection in order to keep its quality and credentials.


It is definitely hard to maintain an open NFT platform where there is no verification, especially with very fast growth. However, ignoring such problems will definitely hurt the platform in the long run.
From the technical side, it was straightforward to mine blockchain data and explore users’ activities on the platform. The data was small enough to handle in my MacBook Air and I had a lot of fun finding various patterns!

The Data

I got all the events data using the infura api and web3py python library. You can download all of it from here. Most of it is self-descriptive. I also used networkx and Neo4j for some graph exploration and visualization.

What next?

If you enjoyed reading this article you can clap! If you want to support this sort of reviews on blockchain products, you can donate to this address:

Master student in computer science at ETH