The frequency of each top-level prefix (which tells you the geographical or language region) would be interesting. That would the first thing I'd calculate if I had the data on my disc.
Much better would be a UUID generated from unique values, like a hash of the timestamp and publisher of a book. If you limit the length and number of the fields you hash to generate the UUID, you could even prove there will be zero collisions and eliminate any need to collision checks and thus an organization that charges money.
I will leave figuring out which hashing functions were known back in 1970, and experimenting with calculating them by hand, up to you. :)
Short values are more reliable in retail situations. They can be typed in by hand or read with cheap scanners.
You are of course free to publish without an ISBN if you don't care about the legacy ecosystem.
There's nothing stopping anyone from creating or promoting an alternative but I don't think the incentives are there. There's not enough money in it, and I don't think the cost savings are enough to make a switch compelling.
* Rather than compute a hash you could just generate a random number: same risk of collision if done correctly (but different opportunities for making a mistake).
* When ISBNs were introduced in the 1960s people would have been typing and even handwriting them so keeping them short would have been important.
* ISBNs have now been incorporated into EANs (13 digits), which are used for all things sold by retailers, except in the USA and Canada, which, according to Wikipedia, use a system called UPC. (Ironically, the U stands for "universal" while the E stands for "European". Of course the 12-digit system got incorporated into the 13-digit system. Probably there will be a 14-digit system one day.)
* In a UK supermarket if the barcode won't scan someone has to type in the digits. I assume that in most cases they type all 13 digits but I haven't watched carefully. (Of course I am now inspired to watch more carefully next time it happens.) They could have a really clever interface connected to a real-time database of barcodes which recently failed to scan because I expect whole batches of a product have badly printed or crinkled packaging.
* A suitably designed 25-digit system would only take twice as long, or less than twice as long, to type in as the current 13-digit system, but the system would have to be suitably designed for that purpose. Having the computer tell the human at the end "there's a mistake somewhere" would be no good at all. At the very least you could have a check digit for each half and tell the human which half contains the mistake but of course you could do much better than that ...
* I have noticed that Sainsbury's (a major UK supermarket) has a system of 8-digit barcodes for its own products, but Tesco (another major supermarket) uses the standard 13-digit barcodes for its own products.
* ALDI products have giant barcodes printed in several places on the packaging without the corresponding digits printed underneath the barcode: the scanner will never fail!
That's false. Your algorithm of hashing a timestamp and book publisher name cannot be proven to be collision-free.
Additionally, there is address fragmentation; ISMB has blocks:
ISBN issuance is country-specific, in that ISBNs are issued by the ISBN registration agency that is responsible for that country or territory regardless of the publication language. The ranges of ISBNs assigned to any particular country are based on the publishing profile of the country concerned, and so the ranges will vary depending on the number of books and the number, type, and size of publishers that are active. Some ISBN registration agencies are based in national libraries or within ministries of culture and thus may receive direct funding from the government to support their services. In other cases, the ISBN registration service is provided by organisations such as bibliographic data providers that are not government funded.
Section 6.1 of the ISBN International User Manual "A separate ISBN shall be assigned to each separate monographic publication or separate edition or format of a monographic publication issued by a publisher."
This would not be a problem if the numbers were more affordable.
Maybe they're just sitting on a big block of numbers and just giving them away...
The other downside to these free (just about always) and discounted (sometimes) ISBNs is that they link the publisher as the service you got the ISBN through, rather than yourself, even if you're doing what would classically be considered a self-publishing job. How big of an issue is that? IANAExpert, but it seems like there are some nooks and crannies of IP law that can be swayed by owning the imprint, but little practical concern for the average person putting an ebook on Amazon e.g. Perhaps someone with more in depth publishing knowledge can color the risks better than I.
This will be the first book I'm the author of, but the second book I've worked on (the first I was the technical editor for). Neither of these books are out yet (I start writing tomorrow) but they both have ISBNs issued. Even if I never publish the book that ISBN is locked in.
I imagine there's a lot of books that started out but never got finished. That said it looks like ISBNdb doesn't grab directly from the source, but instead crawls the internet looking for ISBN data to put into its database. I'll be interested to see at which stage my ISBN shows up in the database.
It's a legit question to answer.
A more conservative forever... at the end of the human species? Maybe.
OpenLibrary also uses book scans in Archive.org to extract ISBNs (and a few other bits of metadata, like urls in the text):
https://blog.openlibrary.org/2021/08/23/gsoc-2021-making-boo...
And have a software pipeline for that kind of thing available.
It's not super common, but common enough that I ran across that problem when scanning in my bookshelf years ago.
> Physical copies. Obviously this is not very helpful, since they’re just duplicates of the same material.
Alas, this is quite often not the case, in particular for older books for various reasons, for example copies were bound from sheets of different print runs that used freshly assembled typesettings containing accidential or deliberate variations, sometimes sheets were missing or the order of pages is not correct, etc., etc.[2] For important "books" we should therefore digitize every available copy.
As great it would be to have 129,864,880 "books" scanned, this would be just an initial phase. We would need a quality control: Is the resolution of the scans really always sufficient? Are the colours correctly represented (includes every scan a standard colour chart for comparison)? What about watermarks (they are extremly important for dating old books)? ... ...
Besides, I personally prefer to speak of "making books digitally available" rather than of "preserving" them, because many features of a physical copy are impossible to preserve digitally: chemical coposition, (bio-)chemical traces, the DNA of parchment or animal bindings, their texture, how it feels to handle them, their visual appearance under different illuminations ... ...
[1] The number varies from denomination to denomination.
[2] And even renowned contemporary publishers sometimes silently correct errors without changing the numbering of the edition.
https://theconversation.com/turning-diamonds-defects-into-lo...
Another is microetching, i.e. ion-beam insertion of foreign atoms into crystalline materials, such as diamond or nickel, although the data density is lower than the above approach, it seems a lot less sensitive (i.e. light should have less effect):
http://booksearch.blogspot.com/2010/08/books-of-world-stand-...