• the key to efficiency here seems to be “caching”, more specifically their caching strategy
• traditionally, caching on the web is done by assuming resource access follows the Zipf Distribution[1]
• Zeta Distributions are basically Zipf Distributions[2] so you can effectively re-word the title as “Efficient data loading using caching” (zipf = “caching” & zeta = zipf => zeta = “caching”)
• It’s important to note that Zipf/Zeta don’t model extremes very well, so there’s potential for outliers causing costly cache misses. Monitor your logs!
[1] distribution follows a logarithm, so the most popular resource is accessed disproportionately more than the second most popular item and so on.
Example is word frequency, modeled as 1/n; second most popular word occurs 50% as much as the first most popular word (1/2), third most popular word occurs 33% as much as the first (1/3) and so on, showing an exponential falloff with a long tail. It thus makes sense to cache the first 10 most popular words as they are going to get accessed more than 90% of the time, giving you the efficiency. Basically this is a form of power law and similar to Pareto Distribution (20% of the things deliver 80% of the result)
[2] rigorously speaking, zeta is the normalized form of Zipf. But practically they are similar enough that people use the terms interchangeably.
• the key to efficiency here seems to be “caching”, more specifically their caching strategy
• traditionally, caching on the web is done by assuming resource access follows the Zipf Distribution[1]
• Zeta Distributions are basically Zipf Distributions[2] so you can effectively re-word the title as “Efficient data loading using caching” (zipf = “caching” & zeta = zipf => zeta = “caching”)
• It’s important to note that Zipf/Zeta don’t model extremes very well, so there’s potential for outliers causing costly cache misses. Monitor your logs!
---
Further reading:
• https://pdfs.semanticscholar.org/337e/4b7f57ccbb7485950b93da... (1999)
• https://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs...
• https://en.wikipedia.org/wiki/Zipf%27s_law
• https://www.springer.com/in/book/9781402080494
---
[1] distribution follows a logarithm, so the most popular resource is accessed disproportionately more than the second most popular item and so on.
Example is word frequency, modeled as 1/n; second most popular word occurs 50% as much as the first most popular word (1/2), third most popular word occurs 33% as much as the first (1/3) and so on, showing an exponential falloff with a long tail. It thus makes sense to cache the first 10 most popular words as they are going to get accessed more than 90% of the time, giving you the efficiency. Basically this is a form of power law and similar to Pareto Distribution (20% of the things deliver 80% of the result)
[2] rigorously speaking, zeta is the normalized form of Zipf. But practically they are similar enough that people use the terms interchangeably.