ISBNdb dump – how many books are preserved forever?

billblack · on Oct 31, 2022

As someone who would like to publish, my main concern with ISBN's is the cost, because publishers are required to assign an ISBN to every item in their catalog.

Section 6.1 of the ISBN International User Manual "A separate ISBN shall be assigned to each separate monographic publication or separate edition or format of a monographic publication issued by a publisher."

This would not be a problem if the numbers were more affordable.

gigel82 · on Oct 31, 2022

That's surprising; I wanted to make a picture book (with a bit of text the kids wrote) to send to grandparents and stumbled upon BookWright; seemed an affordable choice but was very surprised they actually included an ISBN with the little one-off kids picture book.

Maybe they're just sitting on a big block of numbers and just giving them away...

pilimi_anna · on Oct 31, 2022

This is interesting to learn about. How expensive is it, and do you know if it differs around the world (since there are lots of national ISBN agencies)?

jkingsman · on Oct 31, 2022

It's $125 list price for a single ISBN, but there are bulk discounts buying direct and purchasing through a large scale supplier can make them as cheap as $10 each. There are deals to be had; Amazon, for example, may give you a free ISBN for your ebook as long as you publish it using KDP, their walled-garden publishing system, but the gotcha is the ISBN is not portable/you're not permitted to use it for other editions outside of the Amazon system.

The other downside to these free (just about always) and discounted (sometimes) ISBNs is that they link the publisher as the service you got the ISBN through, rather than yourself, even if you're doing what would classically be considered a self-publishing job. How big of an issue is that? IANAExpert, but it seems like there are some nooks and crannies of IP law that can be swayed by owning the imprint, but little practical concern for the average person putting an ebook on Amazon e.g. Perhaps someone with more in depth publishing knowledge can color the risks better than I.

toomuchtodo · on Oct 31, 2022

Can I just start a non profit “publisher”, buy a block of ISBNs, and hand them out at cost? Costco for ISBNs sort of thing, just enough margin to pay for a few hours a year of my time and a little app to drive the process.

Edit: HN throttling, can’t reply. What if the Internet Archive gets a block and hands them out via Open Library? They seem positioned to argue they’re a bonafide publisher.

Finnucane · on Oct 31, 2022

"There are unauthorized re-sellers of ISBNs and this activity is a violation of the ISBN standard and of industry practice. A publisher with one of these re-assigned ISBNs will not be correctly identified as the publisher of record in Books In Print or any of the industry databases such as Barnes and Noble or Amazon or those of wholesalers such as Ingram."

Finnucane · on Oct 31, 2022

>Amazon, for example, may give you a free ISBN for your ebook as long as you publish it using KDP,

That is indeed a bit of a ruse, since an ISBN is supposed to identify an edition or format, but not the sales channel. We give our epub files an ISBN, and all the vendors that sell that file (including Amazon) use the same number. But when you publish with KDP, you are not the publisher. Amazon is, so you have less say in the matter.

pilimi_anna · on Oct 31, 2022

$125 is steep, but $10 sounds very doable (for the US).

Good point on the "publisher" linking caveat though. I don't know much that matters in this day and age? Would be useful to learn from some published authors.

wrs · on Oct 31, 2022

For the US, between $125 each (for 1) and $1.50 each (for 1000) from the official source, a company called Bowker. The structure is described in Bowker’s FAQ [0].

[0] http://isbn.org/faqs_general_questions#isbn_faq6

fragmede · on Oct 31, 2022

In the middle is 10 for $295, which is $30 per. That's a small enough number that you can split the cost with a few friends. But you can also get one from Australia, $88 for 10 or $44 for one.

jwilk · on Oct 31, 2022

In Poland, you get ISBNs for free.

bloak · on Oct 31, 2022

Google has claimed that about 130 million books have been published (that factoid is all over the web). The number of 10-digit ISBNs is 1000 million (there's a check digit) and people have only just started using 13-digit ISBNs that start with 979 instead of 978; but of course there must be lots of wasted ISBNs, for example when a publisher optimistically buys a big block and then goes bankrupt. Both those numbers suggest that the "ISBNdb" with less than 31 million ISBNs is far from complete.

The frequency of each top-level prefix (which tells you the geographical or language region) would be interesting. That would the first thing I'd calculate if I had the data on my disc.

ComputerGuru · on Oct 31, 2022

I always loved how despite the massive domain differences, the ISBN situation is extremely similar to the IPv4/IPv6 situation (except more aggressively rent-seeking), with prefixes leased out to the old dogs, concerns about eventual address/isbn exhaustion, a scheme for mapping old ISB10 to new ISBN13 codes, etc etc.

23skidoo · on Oct 31, 2022

I'm a little perplexed by the ISBN system. The whole centralized affair, where you have to purchase ISBNs seems like a racket. ISBNs cost more in some countries (America) than they do in others (Canada). Not for any reason other than that they can get away with it.

Much better would be a UUID generated from unique values, like a hash of the timestamp and publisher of a book. If you limit the length and number of the fields you hash to generate the UUID, you could even prove there will be zero collisions and eliminate any need to collision checks and thus an organization that charges money.

egypturnash · on Oct 31, 2022

ISBN was introduced in 1970. While hash functions did exist at this point (https://en.wikipedia.org/wiki/Hash_function#History) the computational resources generally available for this sort of thing were... rather lacking. The Apple II wasn't introduced until 1977.

I will leave figuring out which hashing functions were known back in 1970, and experimenting with calculating them by hand, up to you. :)

xyzzy123 · on Oct 31, 2022

While archaic, ISBN doesn't seem a bad system to me.

Short values are more reliable in retail situations. They can be typed in by hand or read with cheap scanners.

You are of course free to publish without an ISBN if you don't care about the legacy ecosystem.

There's nothing stopping anyone from creating or promoting an alternative but I don't think the incentives are there. There's not enough money in it, and I don't think the cost savings are enough to make a switch compelling.

toomuchtodo · on Nov 1, 2022

There’s an interesting interaction of ISBNs with DOIs.

https://guides.library.oregonstate.edu/c.php?g=285973&p=2442...

https://www.doi.org/factsheets/ISBN-A.html

bloak · on Nov 1, 2022

That's definitely an interesting question, why they don't use a longer identifier without central/hierarchical allocation. I don't have an answer, but some possibly relevant points:

* Rather than compute a hash you could just generate a random number: same risk of collision if done correctly (but different opportunities for making a mistake).

* When ISBNs were introduced in the 1960s people would have been typing and even handwriting them so keeping them short would have been important.

* ISBNs have now been incorporated into EANs (13 digits), which are used for all things sold by retailers, except in the USA and Canada, which, according to Wikipedia, use a system called UPC. (Ironically, the U stands for "universal" while the E stands for "European". Of course the 12-digit system got incorporated into the 13-digit system. Probably there will be a 14-digit system one day.)

* In a UK supermarket if the barcode won't scan someone has to type in the digits. I assume that in most cases they type all 13 digits but I haven't watched carefully. (Of course I am now inspired to watch more carefully next time it happens.) They could have a really clever interface connected to a real-time database of barcodes which recently failed to scan because I expect whole batches of a product have badly printed or crinkled packaging.

* A suitably designed 25-digit system would only take twice as long, or less than twice as long, to type in as the current 13-digit system, but the system would have to be suitably designed for that purpose. Having the computer tell the human at the end "there's a mistake somewhere" would be no good at all. At the very least you could have a check digit for each half and tell the human which half contains the mistake but of course you could do much better than that ...

* I have noticed that Sainsbury's (a major UK supermarket) has a system of 8-digit barcodes for its own products, but Tesco (another major supermarket) uses the standard 13-digit barcodes for its own products.

* ALDI products have giant barcodes printed in several places on the packaging without the corresponding digits printed underneath the barcode: the scanner will never fail!

IncRnd · on Nov 1, 2022

> Much better would be a UUID generated from unique values, like a hash of the timestamp and publisher of a book. If you limit the length and number of the fields you hash to generate the UUID, you could even prove there will be zero collisions and eliminate any need to collision checks and thus an organization that charges money.

That's false. Your algorithm of hashing a timestamp and book publisher name cannot be proven to be collision-free.

8n4vidtmkvmk · on Nov 1, 2022

but the probability of 16 completely random bytes is extremely low..

IncRnd · on Nov 1, 2022

Yes, but I was refuting a false point, that those bytes can be proven to never collide... Obviously, they can collide. In the real world, programmers should be prepared for random collisions, yes, but also for created collisions...

False assumptions are the bane of correct design and will cause an entire system to fail in unpredictable ways or be exploited without detection.

contingencies · on Oct 31, 2022

Many things are published without ISBNs or have ISBNs and aren't traditional books. Here in China, to get an ISBN for a book you have to have a government approval process. So many publishers will print stuff on the proverbial sly, often at night, without assigning an ISBN. There's also book-like printed matter (pamphlets, maps, posters, puzzles, 3D/fold-out dioramas, etc.) which often lack an ISBN. So equating ISBN with book is not correct. Then there's all the stuff published pre-ISBN...

adolph · on Oct 31, 2022

An ISBN is assigned to each separate edition and variation (except reprintings) of a publication. For example, an e-book, a paperback and a hardcover edition of the same book will each have a different ISBN.

Additionally, there is address fragmentation; ISMB has blocks:

ISBN issuance is country-specific, in that ISBNs are issued by the ISBN registration agency that is responsible for that country or territory regardless of the publication language. The ranges of ISBNs assigned to any particular country are based on the publishing profile of the country concerned, and so the ranges will vary depending on the number of books and the number, type, and size of publishers that are active. Some ISBN registration agencies are based in national libraries or within ministries of culture and thus may receive direct funding from the government to support their services. In other cases, the ISBN registration service is provided by organisations such as bibliographic data providers that are not government funded.

https://en.wikipedia.org/wiki/ISBN

eCa · on Oct 31, 2022

Yeah, lots and lots of unused ISBNs. As an example, O’reilly has the 978-0-596 series. That’s a hundred thousand editions.

Tomte · on Nov 1, 2022

ISBNs are supposed to be unique, but they aren't. Publishers reuse ISBNs, by mistake if they are following the rules, or sometimes intentionally.

It's not super common, but common enough that I ran across that problem when scanning in my bookshelf years ago.

Archelaos · on Oct 31, 2022

The problem with counting "books" is that the term is used in so many different ways, that one might end up with estimates that differ by several magnitudes depending how narrow or wide a definition or charactierization one adopts. How many books is a bible? One or around 80.[1] When there is a new minor edition, do we count no, one or 80 new books? Some of this 80 "books" are only letters and less than a page or only a few pages long. Shall we count them all as "books"? If we do so, should we than count each letter of a modern published correspondence as a single "book"? Poems were often published as very small booklets, but for prominent writers you may be able to purchase their "complete works" in a single more or less thick volumn, or the very same text in one thick volumn or a few more handy volumns. How should we count this?

> Physical copies. Obviously this is not very helpful, since they’re just duplicates of the same material.

Alas, this is quite often not the case, in particular for older books for various reasons, for example copies were bound from sheets of different print runs that used freshly assembled typesettings containing accidential or deliberate variations, sometimes sheets were missing or the order of pages is not correct, etc., etc.[2] For important "books" we should therefore digitize every available copy.

As great it would be to have 129,864,880 "books" scanned, this would be just an initial phase. We would need a quality control: Is the resolution of the scans really always sufficient? Are the colours correctly represented (includes every scan a standard colour chart for comparison)? What about watermarks (they are extremly important for dating old books)? ... ...

Besides, I personally prefer to speak of "making books digitally available" rather than of "preserving" them, because many features of a physical copy are impossible to preserve digitally: chemical coposition, (bio-)chemical traces, the DNA of parchment or animal bindings, their texture, how it feels to handle them, their visual appearance under different illuminations ... ...

[1] The number varies from denomination to denomination.

[2] And even renowned contemporary publishers sometimes silently correct errors without changing the numbering of the edition.

pugworthy · on Oct 31, 2022

Define "forever" in this context? 10 years? 100? 1000?

It's a legit question to answer.

kleer001 · on Nov 1, 2022

A real forever? Past the life of Sol and Earth. Unlikely.

A more conservative forever... at the end of the human species? Maybe.

photochemsyn · on Nov 1, 2022

There are some interesting technologies in the pipeline for truly long-term data storage. Synthetic diamond is one option (light-sensitive, so perhaps susceptible to cosmic-ray degradation over time):

https://theconversation.com/turning-diamonds-defects-into-lo...

Another is microetching, i.e. ion-beam insertion of foreign atoms into crystalline materials, such as diamond or nickel, although the data density is lower than the above approach, it seems a lot less sensitive (i.e. light should have less effect):

https://en.wikipedia.org/wiki/HD-Rosetta

tedivm · on Oct 31, 2022

The timing on this for me is really interesting, as last week I got an ISBN issues for a book I'm working on (9781633438002 if anyone is curious!).

This will be the first book I'm the author of, but the second book I've worked on (the first I was the technical editor for). Neither of these books are out yet (I start writing tomorrow) but they both have ISBNs issued. Even if I never publish the book that ISBN is locked in.

I imagine there's a lot of books that started out but never got finished. That said it looks like ISBNdb doesn't grab directly from the source, but instead crawls the internet looking for ISBN data to put into its database. I'll be interested to see at which stage my ISBN shows up in the database.

delecti · on Oct 31, 2022

What's the rationale behind reserving an ISBN before even beginning the writing process?

lmm · on Oct 31, 2022

It's a good unique key to use for tracking the book, even internally. You might change the title of the book at a late stage. You could use your own ID scheme, but what if your publisher merges with another while the book production is in process?

johannes1234321 · on Oct 31, 2022

Before writing might be a bit early, but before finishing and producing is useful as it can be listed early in catalogs for preorder. Getting the ISBN earlier probably is cheap and allows the publisher to use the ISBN as identifier for the whole project.

tedivm · on Oct 31, 2022

I have no idea why they assigned it this early, but that seems as likely a reason I can think of.

It was only assigned the day we finalized the contract, and there was a lot of work before that working the proposal through the system and getting reviews from the target audience and people familiar with the topic. It's only now that I'm expecting to hand content over on a schedule that they assigned the number.

omoikane · on Oct 31, 2022

That statement of "before the demise of Google Books" seems unnecessary. The next quoted bit of "at least until Sunday" might have been an attempt to complete the joke, but should be interpreted as the number of books changing rapidly according to the (12 year old) linked article.

http://booksearch.blogspot.com/2010/08/books-of-world-stand-...

ZeroGravitas · on Oct 31, 2022

> extracting ISBNs from the actual book scans themselves (in the case of Z-Library/Libgen).

OpenLibrary also uses book scans in Archive.org to extract ISBNs (and a few other bits of metadata, like urls in the text):

https://blog.openlibrary.org/2021/08/23/gsoc-2021-making-boo...

And have a software pipeline for that kind of thing available.

pilimi_anna · on Oct 31, 2022

There's probably a lot of things that Open Library does that we can try to apply to shadow libraries!

mechanical_bear · on Oct 31, 2022

Forever? 0.