As a final project for the Advanced Embedded Systems course, my team designed and deployed a bitcoin miner on an FPGA

The Bitcoin Blockchain

In order to understand the functionality of the miner, it will be important to know the basics of bitcoin itself. Bitcoin is comprised of something called the blockchain. The blockchain contains information about each and every transaction that has ever occurred using bitcoin. Each "block" on the blockchain contains a header, and a set of transaction information.

The goal of the blockchain is to create a ledger of transactions that can not be changed, and is not controlled by any single individual. Millions of copies of the blockchain are stored around the world, so if someone tries to maliciously change a transaction within it, that transaction change will be verifiably wrong. In order to store data in a way that makes this possible, transaction information is hashed together into a Merkle Root, which is then stored in a blockes header as shown below for 3 transactions A, B, and C.

Each block header also contains the hash of the block before it. If a single value in the header before changes, its hash will change significantly. The result of this is that if someone were to modify a single transaction, it would then propagate that change to the Merkle root of its header, which would further propagate the change to the header of every single block which follows. The exact block which was changed could be immediately recognized.

Why Mining is Essential for Bitcoin

Mining is the process that allows bitcoin transactions to occur. Every bitcoin transaction that is made is submitted to a pool of transactions. Each miner will take a set of transactions in the pending pool and put them together into a block. If the miner successfully mines this block, the block is then permanently added to the blockchain. Without miners taking blocks and authenticating their veracity, there would be no ledger of transactions, and thus there would be no bitcoin. In order to keep this system in place, there is a reward for mining blocks, both in the form of a transaction fee per transaction within the block that was mined, along with newly minted bitcoin. The minted bitcoin that is rewarded is reduced by half every 4 years, and the total supply of bitcoin will peak sometime in the 2030s.

If mining blocks is so lucrative then there needs to be a way to decide who gets the reward, as there are far more people trying to mine blocks than there are transactions at any given moment. Bitcoin has set a standard that a block must be successfully mined on average every 10 minutes. It achieves this metric with something called the difficulty target.

Size	Field
4 bytes	Version
32 bytes	Previous Block Hash
32 bytes	Merkle Root
4 bytes	Timestamp
4 bytes	Difficulty Target
4 bytes	Nonce

Content of a bitcoin header

Miners put together the bitcoin header with the current difficulty target (set by bitcoin), and the transaction information. The goal, is to change the Nonce value and then compute a hash of the header. If the hash result is below the value calculated by the difficulty target, then the miner successfully mines the block.

	Hash Target	Difficulty
Genesis	0x00000000FFFF0000000000000000000000000000000000000000000000000000	1
Current	0x0000000000000000000331DB0000000000000000000000000000000000000000	88 Trillion

Bitcoin difficulty on the first ever block, and now

It is 88 trillion times harder to mine a block now than it was when bitcoin was first created. This difficulty target follows the collective hash rate of all miners in the system.

This system results in bitcoin rewards relative to how much hashing power you have. The more hashes you can complete, the more blocks you will successfully mine.

Designing a Miner for an FPGA

There are many designs available which implement hashing algorithms efficiently and effectively using verilog. These designs will give you a maximum functional frequency they can operate at, and tell you the clock latency at which they will operate. We found a good core from OpenCores. This core takes 65 clock cycles to complete the full hash, and it can run at 80 MHz. These blocks can be placed in the FPGA fabric, and you can use the communication busses (usually AXI Streams) to move data to and from the Cores.

Initially we thought that data throughput would be the bottlekneck, as the cores can hash quite quickly, and if all of the inputs and outputs needed to be moved each time, you would hit a datarate limit far before you have maxed out the performance of your hashing cores. Here is what that limit looks like for a 30 bit/cycle and 52 bit/cycle CDMA data transfer layout.

Hashrate Vs SHA256 Core Count for 2 different bitrates when communication limited

The hashrate initially scales very well with sha core count, then sharply hits a wall as the CDMA is not able to provide the required data throughput. Luckily, with the mining problem, the data that you need to move is pretty minimal. For each core, you send the block header with an initially null nonce. Then, the block itself can iterate the nonce and calculate hashes repeatedly until it either finds a low enough hash target, or runs out of nonce combinations.

Final Core Design

The final core design is quite simple. The cores all work in parallel, computing the hashes for the given Header, and raising an output flag if they find a valid combination.

Action Pipeline for Core Design

As expected, the performance of the device scales linearly with the number of cores as shown below. However what is less expected, is that the power consumption does not scale linearly when you start reaching higher core counts. This is because the FPGA is not able to efficiently place all of the cores when it is nearing a full state, and far more parasitic power from difficult routing is lost.

The FPGA board we used is a relatively older architecture, and unfortunately that means that it is going to get absolutely smoked in hashrate by a modern CPU. However despite the age disadvantage between the CPU and the FPGA (the FPGA is about 7 years older), our implementation still defeats the CPU in Hashrate/Watt.

Conclusions

Building an FPGA Bitcoin Miner was an awesome lesson both in learning the fundamentals of cryptocurrency, and in computer accelerator design. Despite the FPGA mining not being profitable today (if we had 20 Ultra96 boards running for 1 year we would earn about 1 cent, and it would cost 60 dollars in electricity), it would have been amazing 15 years ago and the lessons can translate into designs for modern pressing topics.