Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Mental hashing for paper address books (with Python) (canonical.org)
49 points by limmeau on Jan 9, 2011 | hide | past | favorite | 11 comments


One problem I see with the evaluation is he uses a generic English corpus. The real application is names though, which probably have different statistical properties. Shouldn't be too hard to find a list of real names somewhere on the web.

edit: here for example http://names.mongabay.com/most_common_surnames.htm


Thanks for the link. Hash function "First and fourth letter" wins for english surnames:

https://gist.github.com/bd9fabf91a5501b215c5

(I copied the names table from your link and applied the original program to it)


There's a tradeoff between speed of lookup and ability to find things when you only partially recall the name. For those situations (the "tip-of-the-tongue" phenomenon), I'd pick either the first and final letters as the easiest to recall, or the first and second letters. The fourth letter won't be easy to retrieve unless you recall the name exactly.


This is awesome! Thanks!

The reason I used English corpus words was precisely that I did not have this surname frequency list handy.

I don't think this is a list of English names, though. #18 is Garcia, and #19 is Martinez, and so on. It's a list of US names. A similar list for Argentina would probably be most useful for me at the moment.


Did anyone notice the mailing list? It's one dedicated to the author. Interesting alternative for a blog.


There are some real gems on the list, too:

"Smalltalk Performance and Moore's Law" http://lists.canonical.org/pipermail/kragen-tol/2007-March/0...

"OCaml vs. SBCL, and various other interpreters" http://lists.canonical.org/pipermail/kragen-tol/2007-March/0...

"what affects programming language adoption?" http://lists.canonical.org/pipermail/kragen-tol/2006-Novembe...

(kragen is also kragen here)


Thank you! I'm glad you enjoyed them.


I will say one thing, a blog would have trouble surviving since 1998: http://lists.canonical.org/pipermail/kragen-hacks/


The mailing list is having some trouble surviving, too. Apparently at some point Google decided our domain was spammy, and all the Gmail subscribers started getting their mail automatically spam-filtered. I contacted a bunch of people directly (via Google Talk or Facebook) to get them to fish the latest mail out of the spambox.

Beyond that, I don't know what to do. I guess I could post more often.

I certainly need a better blog interface for it. What I have right now is http://www.bentwookie.org/blog.


Reading through http://lists.canonical.org/pipermail/kragen-tol/2010-March/0...

Managing a site through a DVCS is, IMO, a good idea. (I do it for my own site, http://www.gwern.net/ ). But I think your worries are somewhat groundless. If you are interested in preventing patent problems years down the line, there's no need for fancy cryptographic commitment schemes; you could probably just appeal to archive sites like the Internet Archive or WebCitation. When you access their archived pages, the pages come with timestamps in the frame or as part of the URL.

To some extent, you are already doing this: http://web.archive.org/web/*/http://lists.canonical.org/pipe...

(I know that the Internet Archive has been used by some courts for one purpose or another, though I don't know that it has been employed for demonstrating prior art.)

That said, if you investigated existing cryptographic time-stamping services (http://en.wikipedia.org/wiki/Trusted_timestamping#External_l...) and figured out how to integrate them into a DVCS (a shell script called from cron?), I would certainly find that an interesting thing to read on Hacker News.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: