How Blockchain Structures Prevent Data Tampering?

29 September 2017

Blockchain Structures are central to systems such as Bitcoin and Ethereum. They are one of the three pillars of Decentralized Consensus Systems.

For a Currency, Blockchain Structures allow any participant to:

  • Check, at any time, that nobody cheated by sending the same unit of money twice (double spending).
  • Check the whole dataset. No central authority is responsible for data quality.
  • Detect quickly any hostile participant sending tampered data.

How can the blockchain capable ensure data integrity? What were the limitation of previous methods? How does it work?

Let’s “rediscover” the blockchain.

Let’s start from a simplified model and study its limitations. Steps by steps we’re going to fix this model and see how the blockchain emerges from the solved issues.


Here’s the simplified model that we’re going to fix:

We have a network of three participants. Each member can send messages to the other two. They agree to send a message one after the other in a predefined order. They are going to trade a Currency.

Simply put: imagine Alice, Bob, and Chuck. They agree to trade their currency, “FakeCoins,” and exchange services & goods through it. Alice is going to make trading decisions and tell the other two by email. Then Bob makes decisions and sends a mail. Then Chuck. Then Alice again and, so on.

Note that:

  • This is a small group of participants who knows each other. No need for P2P Protocols.
  • These participants agreed on the way to make the system progress, and will not cheat on this. No need for “Proof Of…”.

We focus on storing & exchanging the state of our decentralized consensus.

Everybody starts with one FakeCoin.

Model 1: They send the whole database

We start with the following state:

Alice: 1
Bob: 1
Chuck: 1

A. Alice buys a lamp from Chuck: she sends him a FakeCoin

She sends an email to Bob and Chuck:

The two other reads the email, they agree. Chuck sends the lamp to Alice.

B. Bob does nothing

He sends the previous email again:

C. Chuck cheats

Chuck decides to cheat. He tells each participant that he sent them FakeCoins. That’s a Double Spending.

Chuck trick is not visible to the others because they do not share memory.

When Alice opens Chuck’s email, she sees that Chuck sent her two FakeCoins. Locally, the rules of FakeCoins are intact. She sends two cookies to Chuck. Same for Bob, he only sees the FakeCoins coming from Chuck, he agrees and sends him two gold nuggets.

D. Alice sends FakeCoins to Bob

According to the message she received during the previous turn, Alice can send two FakeCoins to Bob. She sends the following message:

Bob opens the mail: he sees the state he received at the previous turn. He assumes that Alice didn’t do anything. At this moment, our participants share the SAME state. They all agree that:

Alice: 0
Bob:   3
Chuck: 0

Transactions go on. There’s no trace of Chuck’s Cheat.

Weeks later, Alice complains “I still haven’t received Bob’s gold nuggets”. Bob has no idea what Alice is talking about. The system is not making consensus anymore. Note that this is not a security or authentication issue. Alice, Bob, and Chuck transferred their own FakeCoin from their accounts.

Model 2: Transaction Log

Alice, Bob, and Chuck start again. They learned from their mistake, and try something new. They are going to send the transaction AND the entire state. That way, each participant can replay the transaction and check that they agree on the state. Each node knows if there’s an inconsistency between their state and the proposed one.

A. Alice sends a FakeCoin to Chuck

She sends a message to each participant:

 Alice sends 1 to Chuck
 Alice: 0
 Bob: 1
 Chuck: 2

Everybody agrees.

B. Bob does nothing

 Alice: 0
 Bob: 1
 Chuck: 2

And so on. When Chuck cheats, Alice will send a transaction to Bob that will trigger an error. The system won’t digest an invalid case and will fail as soon as consensus is at risk.

Model 3: Alpha

After a few months without issues, Alice, Bob, and Chuck decide to accept more users. Each user promise to respect the previous rules (no need for Proof Of and P2P).

Now they have an issue: each time a user joins, the messages they send get bigger. Doubling the number of users means doubling the size of every message.

They realize that their system’s bandwidth is increasing too fast and won’t scale.

Alice proposes a change:

We need the transaction to replay each operation. But we don’t care about the state itself. I just want to tell whether the state I have is equal or different to the state proposed by another participant.

She proposes to use a Hash rather than sending the whole state. Instead of sending each line of the state, we pass it through a hash function like SHA384 to compute hash("Alice:0,Bob:1,...").

 Alice sends 1 to Chuck
 Alice: 0
 Bob: 1
 Chuck: 2
 Zardoz: 6


 Alice sends 1 to Chuck

You probably used CRC or MD5 to check the integrity of a file before, that’s the same idea. This hash is the Fingerprint of our database. Any change in the database implies a noticeable change in the Hash value. The system keeps its properties, users can still verify that:

  • Each transaction is valid,
  • They agree with the global state of the Consensus.

And now, each message has a fixed size no matter the number of users. The systems can grow again.

Model 4: The Blockchain

The system works without any issue for a few months. Alice, Bob, and Chuck make it public. Anyone can join.

They discover a new scalability issue: Each participant has to go through the whole dataset to compute its Hash to validate a new block. Now that their system is global, the less powerful machines are not fast enough to validate the data.

It’s a red flag: They may have to rely on a few powerful machines to verify & propose new blocks. That is the beginning of re-centralization.

Bob comes up with another idea:

As long as the Hash comprises the whole state; if every participant agrees on the function, we can use any Hash Function.

When we propose the block:

 Alice sends 1 to Chuck

This hash represents the state of the database. We could use any hash function, for example:

- SHA384("Alice:1,Bob:2,Chuck:0")
- SHA384("Chuck:0,Alice:1,Bob:2")
- SHA512("Chuck:0,Bob:2,Alice:1")

But wait: at any moment, the current state is equal to a transaction applied to the previous state, right? We can say that:

State[n] = apply(Transaction[n], State[n-1])

So, with the concat function, we could use:

Hash(State) = SHA384(Transaction[n] ⋅ State[n-1])

This Hash function would cover the entire state. A different hash value implies a different state.

Let’s keep going: We could replace State[n-1], the previous state, with its hash too. We get a recursive function:

Hash(State) = SHA384(Transaction[n] ⋅ Hash(State[n-1]))

Again, this Hash function would cover the entire state. And a different hash value would mean a different state.

Bob proposes to send messages that hash the previous state with the transaction:

 Alice sends 1 to Chuck
Hash(Transaction ⋅ Previous Hash):

Do you see the CHAIN of BLOCKS? Each block contains a hash of the entire state because it relies on the hash of the previous state. We chain each block with the previous one. With this representation, it is easy to verify new blocks. We need to compute a single, quick hash, to compare one state with the one proposed, no matter the size of the database.

Et Voilà!

From a minimal model of a Decentralized System, we rediscovered blockchain structures. We have seen how sharing Transactions + State make our data verifiable. And we have seen how a chain of blocks is cost-efficient to verify. We do not need a central authority to protect the data.

A change anywhere along within chain will trigger a change in all the following hashes. Like falling dominoes, it is fast & easy to detect where something wrong happens.

In following articles, we will continue to dig into the components required for the Decentralized Consensus, register to our Newsletter to stay up to date. Or check us out on Social Networks.

Laurent Senta

I wrote software for large distributed systems, web applications, and even robots. These days I focus on making developers, creators, and humans more productive through