Show HN: An educational blockchain implementation in Python (opens in new tab)

(github.com)

412 pointsjre8y ago41 comments

41 comments

> It is NOT secure neither a real blockchain and you should NOT use this for anything else than educational purposes.

It would be nice if non-secure parts of implementation or design were clearly marked.

What's the point of education article, if bad examples aren't clearly marked as bad? If MD5 usage is the only issue, author could easily replace it with SHA and get rid of the warning at the start. If there are other issues, how can a reader know which parts to trust?

Even if fixing bad/insecure parts are "left as an exercise for the reader", learning value of the article would be much greater if those parts would be at least pointed at.

jreOP8y ago

OP here.

erikb is spot on in the sibling comment. This hasn't been expert-reviewed, hasn't been audited so I'm pretty confident there is a bug somewhere that I don't know about.

It's educational in the sense that I tried as best a I could to implement the various algorithmic parts (mining, validating blocks & transactions, etc...).

I originally used MD5 because I thought I would do more exploration regarding difficulty and MD5 is faster to compute than SHA. In the end, I didn't do that exploration, so I could easily replace MD5 with SHA. I'll update the notebook to use SHA, but I'm still not gonna remove the warning :)

I'll also try to point out more explicitly which parts I think are not secure.

magnat8y ago

> I'll also try to point out more explicitly which parts I think are not secure.

Things I've noticed:

* Use of floating point arithmetic.

* Non-reproducible serialization in verify_transaction can produce slightly different, but equivalent JSON, which leads to rejecting transactions if produced JSON is platform-dependent (e.g. CRLFs, spaces vs tabs).

* Miners can perform DoS by creating a pair of blocks referencing each other (recursive call in verify_block is made before any sanity checks or hash checks, so they can modify block's ancestor without worrying about changing its hash).

* mine method can loop forever due to integer overflow.

* Miners can put in block a transaction with output sum greater than input sum - only place where it is checked is in compute_fee and no path from verify_block leads there.

jreOP8y ago

Those are all very good points I didn't think about, thanks for these.

I'll fix the two bugs with verify_block and the possibility for a miner to inject invalid a output > input transaction.

I'll add a note for the 3 others.

1 more reply

tfha8y ago

For what it's worth, MD5 is perfectly safe for use as a proof of work algorithm. You just don't want to use it for authentication / data integrity

erikb8y ago

Being secure is the lack of attack surface. So it's really hard to say. Often what hits you is something you haven't considered.

So what the author is saying he doesn't take responsibility for when you use the result of his work and get problems. If you want to act responsibly build a team of different experts, have tests for security scenarios, have automated tests, from time to time pay people to attack your system and show the weaknesses you have.

Or, do it like 99.9% of people and just accept that there are risks but don't try to put resulting failures on people who provide free solutions for you online.

PS: Incorporation is a good way to protect yourself. Then having even ineffective due diligence processes is enough to protect your personal life from the risks of doing business.

olalonde8y ago

HN is a tough crowd to please. Without the disclaimer, I guarantee you that the top comment would have been one requesting the author to add one, along with a warning about not writing your own crypto.

bogomipz8y ago

>"What's the point of education article, if bad examples aren't clearly marked as bad?"

The doc string for the hash function states:

    An INSECURE hash function that you should not use in the real world.
    Returns an hexadecimal hash

I'm not sure how much clearer you could mark that.

The point in the article seems to be understanding the blockchain protocol and concepts and not "how to write secure crypto."

magnat8y ago

> I'm not sure how much clearer you could mark that.

It's not about this warning. It's about that it's the only warning. From the intro and from discussion here on HN, I imply author knows about other shortcuts made for the sake of simplicity/explanation/readability - I just wished those shortcuts were pointed at more directly in the article, instead of blanket disclaimer.

johnwheeler8y ago

just gives you a sense of the protocol, which is probably GTK if you’re technically inclined and speculating.

h4l08y ago

For the last couple of months, there have been many educational, simple implementations that explains blockchain technology, I guess thanks to crypto bubble. I wish these were around when we were doing our senior design project on blockchains in 2014. Back then, I only could find basiccoin[0], which was purely minified to just fit in 1000 loc.

After that, I decided to re-implement everything from scratch. My foremost constraint was to write readable code so that anyone could read the codebase and have an idea of how blockchain works.

My current draft of implementation can be found on https://github.com/halilozercan/halocoin , which currently lacks detailed README and documentation. However, you can still experiment with it by using API or CLI. I'm running a dedicated server to have an always online peer you can connect to.

[0]: https://github.com/zack-bitcoin/basiccoin

Edit: a word

westurner8y ago

Thanks! Simplest explanation I've seen.

Here's an nbviewer link (which, like base58, works on/over a phone): https://nbviewer.jupyter.org/github/julienr/ipynb_playground...

Note that Bitcoin does two rounds of SHA256 rather than one round of MD5. There's also a "P2P DHT" (peer-to-peer distributed hash table) for storing and retrieving blocks from the blockchain; instead of traditional database multi-master replication and secured offline backups.

> ERROR:root:Invalid transaction signature, trying to spend someone else's money ?

This could be more specific. Where would these types of error messages log to?

jreOP8y ago

Thanks for the precision regarding the hash.

Regarding the error, they are logged when a verify_block/transaction returns False, just to be a bit more explicit about what failed. In a real implementation, I guess you would throw exceptions instead (or use some Result pattern), but I tried and it cluttered the code quite a bit, so I went back to logging.

westurner8y ago

My mistake, it's BitTorrent that has a DHT. Instead of finding the most network local peer with the block identified by a (prev_hash, hash) hash table key, the Bitcoin blockchain broadcasts all messages to all nodes; which must each maintain a complete backup of the entire blockchain.

"Protocol documentation" https://en.bitcoin.it/wiki/Protocol_documentation

geraldbauer8y ago

FYI: Great blockchain (from sratch) starter article. At the Awesome Blockchains page [1] I collect starter blockchains and articles (in Python, Ruby, JavaScript, etc.) the idea is the best way to learn about blockchains is to do-it-yoursef - build your own blockchains from scratch. Great example. Keep it up. Cheers. [1] https://github.com/openblockchains/awesome-blockchains

jreOP8y ago

Thanks ! The awesome-blockchains is a great resource, thanks for sharing.

heynk8y ago

This is great. Just last weekend I did the same thing, coding a very basic blockchain in Python for educational purposes. You tackled wallets, which I didn't get to yet, so that was really helpful.

I'm still a little unsure around exactly how miners and nodes communicate with each other. Especially things like broadcasting transactions and new blocks. Any good resources for that?

jreOP8y ago

Thanks ! I've completely left out communication from this because it wouldn't fit in the notebook and I haven't researched it. Would also appreciate if anybody has good resources on it.

ivansavz8y ago

Very good writeup that shows all the steps.

Note it uses MD5 hash instead of SHA256 so not exactly bitcoin. I wonder how much more work would be to make the code fully implement bitcoin. Will it still be readable? Or Etherium? Would be great value for understanding even if Python would be inefficient to run in prod.

jreOP8y ago

OP here.

Swapping MD5 for SHA256 is very easy. I'll actually do it - see my other answer above for why MD5.

For the other differences to bitcoin and from the top of my head :

- In my implementation, wallet addresses are the public key of the owner. Bitcoin addresses are slightly more complicated [1] and a wallet can (and should) generate a new address for each transaction.

- Bitcoin uses ECDSA instead of RSA

- Bitcoin transactions use a (simpler than ethereum but still) scripting language [2].

- The whole communication part was left out : you need a way to broadcast blocks. I haven't looked into that

- Bitcoin uses a Merkle tree to store transactions (and prune spent ones).

I think the scripting and communication would be the two biggest tasks. But it would also require unit testing and obviously wouldn't fit in a single notebook.

[1] https://en.bitcoin.it/wiki/Technical_background_of_version_1...

[2] https://en.bitcoin.it/wiki/Script

ecesena8y ago

Bitcoin also hashes twice, i.e. it computes sha256(sha256(.)). Supposedly, this is to protect against extension attacks [1].

Was wondering, any specific reason to choose RSA vs ECDSA? Signatures would be smaller.

[1] https://en.wikipedia.org/wiki/Length_extension_attack

jreOP8y ago

No good reason for RSA vs ECDSA. I was just more familiar with RSA, but apparently pycryptodome supports ECDSA as well, so I guess the change should be minimal.

h4l08y ago

Fully implementing Bitcoin white paper would require moderate amount of work if you do not consider every little detail related to security of your client code. However, current Bitcoin protocol added many features such as scripting.

I think one can maintain code readability in a python implementation but documentation is the key here. Developer needs to clearly state the objective of each function.

For ethereum, you need one external element called: Ethereum Virtual Machine. Smart Contracts are basically byte code that runs on EVM. Without it, blockchain cannot function. So, ethereum development may require extra knowledge on top of blockchain technology.

Gargoyle8y ago

I've been ignoring bitcoin for a while. Do the devs still maintain the official implementation is the only implementation possible since it contains quirks the spec doesn't specify?

pmorici8y ago

Yup, though a group got fed up with that and forked to create Bitcoin Cash which has a spec and multiple teams and implementations. Much healthier situation.

krisives8y ago

Bitcoin has had scripting since that first release TX signatures themselves are OP_SIGHASH

vinn1248y ago

nice walkthrough.

also, not sure why folks are nitpicking about minor things like security disclaimers, number of sha256 hashes, md5, etc. while ignoring nontrivial gaps (eg no merkle dags, one of the cornerstone concepts).

jreOP8y ago

Thanks.

You're right about Merkle tree. This is a whole section of the bitcoin paper and it's pretty important. But as far as I understand, it's "only" an optimization to save disk space, so it doesn't change the underlying logic.

jobeirne8y ago

Something from a few months back that's a bit closer to Bitcoin's actual implementation but at a similar level of readability: https://github.com/jamesob/tinychain

brodo8y ago

Coll project. I think having educational * implementations in Python is great in general. Better then having just an article about it...

antman8y ago

So is there a non educational implementation of blockhain that someone could use/parametrize/configure? Something that is not educational but more industrial strength? What must someone do to convert an educational implementation to a serious one?

satellitec4t8y ago

Why is this all contained in one big json document? Am I that out of touch with modern coding??

theuly8y ago

What the author wrote is a IPython / Jupyter notebook [0], which is implemented as a JSON document [1]. The notebooks are pretty much markdown with runnable code blocks in between.

Jupyter notebooks are popular in the data science / Python communities to explain concepts, for examples see here [2].

[0] https://jupyter.org/

[1] https://ipython.org/ipython-doc/3/notebook/nbformat.html

[2] https://github.com/jupyter/jupyter/wiki/A-gallery-of-interes...

tree_of_item8y ago

It's a Jupyter notebook: http://jupyter.org/

haraldooo8y ago

It’s a Jupyter-Notebook (http://jupyter.org/)

partycoder8y ago

Educational implementation, brilliant. Now some startup will give it a name and make an ICO.

j / k navigate · click thread line to collapse

41 comments

magnat8y ago

> It is NOT secure neither a real blockchain and you should NOT use this for anything else than educational purposes.

It would be nice if non-secure parts of implementation or design were clearly marked.

Even if fixing bad/insecure parts are "left as an exercise for the reader", learning value of the article would be much greater if those parts would be at least pointed at.

jreOP8y ago

OP here.

erikb is spot on in the sibling comment. This hasn't been expert-reviewed, hasn't been audited so I'm pretty confident there is a bug somewhere that I don't know about.

It's educational in the sense that I tried as best a I could to implement the various algorithmic parts (mining, validating blocks & transactions, etc...).

I'll also try to point out more explicitly which parts I think are not secure.

magnat8y ago

> I'll also try to point out more explicitly which parts I think are not secure.

Things I've noticed:

* Use of floating point arithmetic.

* mine method can loop forever due to integer overflow.

* Miners can put in block a transaction with output sum greater than input sum - only place where it is checked is in compute_fee and no path from verify_block leads there.

jreOP8y ago

Those are all very good points I didn't think about, thanks for these.

I'll fix the two bugs with verify_block and the possibility for a miner to inject invalid a output > input transaction.

I'll add a note for the 3 others.

1 more reply

tfha8y ago

For what it's worth, MD5 is perfectly safe for use as a proof of work algorithm. You just don't want to use it for authentication / data integrity

erikb8y ago

Being secure is the lack of attack surface. So it's really hard to say. Often what hits you is something you haven't considered.

Or, do it like 99.9% of people and just accept that there are risks but don't try to put resulting failures on people who provide free solutions for you online.

PS: Incorporation is a good way to protect yourself. Then having even ineffective due diligence processes is enough to protect your personal life from the risks of doing business.

olalonde8y ago

bogomipz8y ago

>"What's the point of education article, if bad examples aren't clearly marked as bad?"

The doc string for the hash function states:

    An INSECURE hash function that you should not use in the real world.
    Returns an hexadecimal hash

I'm not sure how much clearer you could mark that.

The point in the article seems to be understanding the blockchain protocol and concepts and not "how to write secure crypto."

magnat8y ago

> I'm not sure how much clearer you could mark that.

johnwheeler8y ago

just gives you a sense of the protocol, which is probably GTK if you’re technically inclined and speculating.

h4l08y ago

After that, I decided to re-implement everything from scratch. My foremost constraint was to write readable code so that anyone could read the codebase and have an idea of how blockchain works.

[0]: https://github.com/zack-bitcoin/basiccoin

Edit: a word

westurner8y ago

Thanks! Simplest explanation I've seen.

Here's an nbviewer link (which, like base58, works on/over a phone): https://nbviewer.jupyter.org/github/julienr/ipynb_playground...

> ERROR:root:Invalid transaction signature, trying to spend someone else's money ?

This could be more specific. Where would these types of error messages log to?

jreOP8y ago

Thanks for the precision regarding the hash.

westurner8y ago

"Protocol documentation" https://en.bitcoin.it/wiki/Protocol_documentation

geraldbauer8y ago

jreOP8y ago

Thanks ! The awesome-blockchains is a great resource, thanks for sharing.

heynk8y ago

This is great. Just last weekend I did the same thing, coding a very basic blockchain in Python for educational purposes. You tackled wallets, which I didn't get to yet, so that was really helpful.

I'm still a little unsure around exactly how miners and nodes communicate with each other. Especially things like broadcasting transactions and new blocks. Any good resources for that?

jreOP8y ago

Thanks ! I've completely left out communication from this because it wouldn't fit in the notebook and I haven't researched it. Would also appreciate if anybody has good resources on it.

ivansavz8y ago

Very good writeup that shows all the steps.

jreOP8y ago

OP here.

Swapping MD5 for SHA256 is very easy. I'll actually do it - see my other answer above for why MD5.

For the other differences to bitcoin and from the top of my head :

- Bitcoin uses ECDSA instead of RSA

- Bitcoin transactions use a (simpler than ethereum but still) scripting language [2].

- The whole communication part was left out : you need a way to broadcast blocks. I haven't looked into that

- Bitcoin uses a Merkle tree to store transactions (and prune spent ones).

I think the scripting and communication would be the two biggest tasks. But it would also require unit testing and obviously wouldn't fit in a single notebook.

[1] https://en.bitcoin.it/wiki/Technical_background_of_version_1...

[2] https://en.bitcoin.it/wiki/Script

ecesena8y ago

Bitcoin also hashes twice, i.e. it computes sha256(sha256(.)). Supposedly, this is to protect against extension attacks [1].

Was wondering, any specific reason to choose RSA vs ECDSA? Signatures would be smaller.

[1] https://en.wikipedia.org/wiki/Length_extension_attack

jreOP8y ago

No good reason for RSA vs ECDSA. I was just more familiar with RSA, but apparently pycryptodome supports ECDSA as well, so I guess the change should be minimal.

h4l08y ago

I think one can maintain code readability in a python implementation but documentation is the key here. Developer needs to clearly state the objective of each function.

Gargoyle8y ago

I've been ignoring bitcoin for a while. Do the devs still maintain the official implementation is the only implementation possible since it contains quirks the spec doesn't specify?

pmorici8y ago

Yup, though a group got fed up with that and forked to create Bitcoin Cash which has a spec and multiple teams and implementations. Much healthier situation.

krisives8y ago

Bitcoin has had scripting since that first release TX signatures themselves are OP_SIGHASH

vinn1248y ago

nice walkthrough.

jreOP8y ago

Thanks.

jobeirne8y ago

Something from a few months back that's a bit closer to Bitcoin's actual implementation but at a similar level of readability: https://github.com/jamesob/tinychain

brodo8y ago

Coll project. I think having educational * implementations in Python is great in general. Better then having just an article about it...

antman8y ago

satellitec4t8y ago

Why is this all contained in one big json document? Am I that out of touch with modern coding??

theuly8y ago

What the author wrote is a IPython / Jupyter notebook [0], which is implemented as a JSON document [1]. The notebooks are pretty much markdown with runnable code blocks in between.

Jupyter notebooks are popular in the data science / Python communities to explain concepts, for examples see here [2].

[0] https://jupyter.org/

[1] https://ipython.org/ipython-doc/3/notebook/nbformat.html

[2] https://github.com/jupyter/jupyter/wiki/A-gallery-of-interes...

tree_of_item8y ago

It's a Jupyter notebook: http://jupyter.org/

haraldooo8y ago

It’s a Jupyter-Notebook (http://jupyter.org/)

partycoder8y ago

Educational implementation, brilliant. Now some startup will give it a name and make an ICO.

j / k navigate · click thread line to collapse