Fallback from CDN to local jQuery

romaniv · on May 1, 2013

I still find the idea of CDNS repugnant. No matter how you slice it, you rely on an external resource for important parts of your application. "What it it goes down?" is one question. But you also should be asking yourself about what will happen if it gets hacked. There are also user privacy issues, which get completely overlooked in the chase for shaving off several milliseconds off request time.

A much better architecture would be to serve JavaScript from your server by default, but allow for distributed content-based caching. For example, your script tag could look like this

The hash would be calculated based on the content of the file. The browser then could fetch it from whatever source it wants. Users could cache stuff locally (across websites), without needing to dial into a CDN every time. You could even use a torrent-like network for distributed delivery of popular script libraries.

olegp · on May 1, 2013

It's not just a few milliseconds though. For example at https://starthq.com, we are based in Finland, but host on Amazon in US East. A round trip to the US is 200ms+ whereas with CloudFront it's 8ms. Before we used a CDN our page took a few seconds to load - now it takes around 200ms.

I should also mention that all this happens only on first load. We embed etags in the URLs and use far off cache control expires dates, so subsequent page loads get the JS and CSS from the browser's cache.

bsimpson · on May 1, 2013

I think there's confusion here about the use of the term CDN. There are public CDNs, like Google AJAX APIs, that allow a shared copy of an open-source library to be downloaded from a known-good location. This enables users to reuse the same copy their browser has already cached across multiple pages, but like romaniv and the OP have pointed out, you are then trusting Google to be good stewards of that resource.

Conversely, you control what shows up on your own private CDN, like CloudFront. Sure, there may be downside outside of your control, but nobody is going to be able to alter the resources there without your permission.

jmillikin · on May 2, 2013

  > Conversely, you control what shows up on your own
  > private CDN, like CloudFront. Sure, there may be
  > downside outside of your control, but nobody is going
  > to be able to alter the resources there without your
  > permission.

Well, CloudFront could, since they control the machines that your users are connecting to.

pyre · on May 2, 2013

One could say the same of any host.

jmillikin · on May 2, 2013

That's the point; it's silly to say that https://some-cdn.com/jquery-1.9.1.min.js is inherently less secure than https://my-cloudfront-proxied-site.com/jquery-1.9.1.min.js

pyre · on May 2, 2013

Well, it also depends on your level of trust for Google vs. CloudFront.

romaniv · on May 1, 2013

I don't want to imply that you personally shouldn't use a CDN, but the page you linked to loads 43 files. If you consolidated, removed links to or inlined some of them, the difference with and without CDN would likely to be much smaller.

olegp · on May 1, 2013

Actually the core StartHQ app is less than 10 files: the libraries and application each have their own JS and CSS files, then there's Font Awesome and a couple of images loaded by Bootstrap. The rest is one third party analytics JS file & iframes for social media sharing buttons, which don't block page rendering and we don't have control over.

tshaddox · on May 1, 2013

Isn't the concern of downtime and hacking only relevant if you have reason to believe that they are more likely to happen with the CDN than with your own servers?

Drakim · on May 1, 2013

If you host everything on your own server, you have one point of failure.

If you host some stuff on server 1 and some stuff on server 2, and you need both to function, then you have two points of failures.

mikeryan · on May 1, 2013

If you host some stuff on server 1 and some stuff on server 2, and you need both to function, then you have two points of failures. reply

This is kind of a simple argument, increasing the number of "points of failure" doesn't, or shouldn't increase your odds of "failing" more. Adding cache layers and CDNs may add "servers" to your architecture but should also be done in way to reduce overall downtime.

josephcooney · on May 1, 2013

Increasing the number of points of failure absolutely does increase the chances of failure, even if each 'point' is more reliable. 1 server with a 98% uptime is more reliable than 5 servers with a 99% uptime, if all 5 have to be working for everything to 'work'.

saraid216 · on May 1, 2013

You're understanding his point incorrectly. The keyword isn't "points of failure". It's "attack surface".

http://en.wikipedia.org/wiki/Attack_surface

tshaddox · on May 1, 2013

That's why fallback techniques like the ones in this article are a great idea (assuming that they are reliable).

bigiain · on May 1, 2013

While I mostly agree with what you're saying... The truth is, why worry about Google getting hacked and having the attacker modify their CDN version of jQuery (or leaking user behaviour and identity), when I'm already letting them load unknown (to me) code in ga.js?

To me, there's a trade-off I've chosen, and while I'm not 100% comfortable with having the availability of my site depend on Google - "repugnant" is _way_ to strong a word to describe the downside to the pragmatic choice I've made.

lmm · on May 1, 2013

Would mega's "load signed scripts" approach work here?

gwgarry · on May 1, 2013

that's a great idea. the src should be the cdn though, but the browser should download the file into something like local storage and make it available for the future.

SoftwareMaven · on May 1, 2013

No, if you want to protect the user's privacy, the source has to be your site, otherwise you're giving the CDN info about your customer. With the hash mechanism, 500 sites could share the same library, but only the site the user is visiting can ever know the user visited that site. Sure, one of those 500 gets the hit to performance on initial cache load, but averaged over all visitors to all sites, that's probably a comfortable trade.

I think it would take Mozilla or Apple to push this. Google probably has too much skin in the CDN-info-gathering game.

wyuenho · on May 1, 2013

This only works if the CDN actually returns 4xx or 5xx codes. This still won't work if the CDN is getting DDOS'ed, as in taking forever to return anything.

gavinpc · on May 1, 2013

Along the same lines, Chrome (maybe just webkit generally) does not fire DOMContentLoaded until external script requests have resolved or timed out, even if they are async.

Also, I don't understand why people feel so strongly about reposting this "document.write" method everywhere, which I found in some cases made my page disappear at load time. You can do the same thing using regular DOM methods, and you get more control over the process.

youngtaff · on May 1, 2013

I believe the blocking of DCL is expected behaviour for scripts with async attribute http://www.whatwg.org/specs/web-apps/current-work/multipage/...

If you insert the script using a script it's execution is delayed until after DCL

wyuenho · on May 1, 2013

Also related, when is the HTML working group going to introduce a timeout attribute?

zalew · on May 1, 2013

Came here to say this.

Besides, in bigger projects jquery is only a small part of the whole js stack. I've been using these jquery cdns for years, now I just pack it together with everything else uglified. There are so many jquery versions in use around there right now I don't feel I gain anything by caring about the chance that this particular 30kb will be cached by some percentage of users.

nilliams · on May 1, 2013

Totally agree, for most production apps using a public CDN for jQuery just isn't worth it. Uglify it with your app and be done with it.

isalmon · on May 1, 2013

I personally decided to use a local file after all.

Pros:

+ I can cache it for a very long time, so all my returning visitors don't have re-download it. I was very surprised to see that CDN's jquery had a very short 'Expire' headers

+ If my server is up and users can open a web page - there's a very high chance that the .js file will load as well.

+ I can combine different jquery libraries/plugins into one file, so my page can load MUCH faster

Cons:

- It might load a little more slowly, because it's not on CDN.

Am I missing something?

Encosia · on May 1, 2013

The Google CDN serves jQuery with a 365 day max-age as long as you reference a specific version (which you should be doing anyway). It only uses the shorter cache expiration, necessarily, if you want a "latest version" reference. More info here: http://encosia.com/the-crucial-0-in-google-cdn-references-to...

martin-adams · on May 1, 2013

You may also have to pay the bandwidth cost.

MatthewPhillips · on May 1, 2013

It saddens me that ES6 modules don't have fallbacks built in. You do:

    import 'http://developer.yahoo.com/modules/yui3.js' as YUI;

wish it was:

    import ['http://developer.yahoo.com/modules/yui3.js', '/libs/yui3'] as YUI;

SoftwareMaven · on May 1, 2013

I think the notion of "fallbacks" (and the sibling comment's "timeouts") are extremely specific to the web. However, there is some prior art on this subject in the python world:

  try:
      import simplejson as json
  except ImportError:
      import json

So it might be worth contacting the committee and expressing that.

numbsafari · on May 1, 2013

Based on comments above, it might also be helpful to configure a timeout of some type. So if, for example, the CDN is suffering from a DDOS or other latency inducing issue, you can force a fallback to your own server.

TazeTSchnitzel · on May 1, 2013

Send the committee an email suggesting this improvement. They might not yet have considered it.

sleepyhead · on May 1, 2013

Fallback to a CDN has been blogged about for a while, don't understand why it is being upvoted now. However, do note the issues presented by Steve Souders though: http://www.stevesouders.com/blog/2013/03/18/http-archive-jqu...

One thing is that only about 1% have the specified version you use on your site.

mrharrison · on May 1, 2013

If you guys take a look at html5 boilerplate http://html5boilerplate.com/ . It has redundancy built in, so if the cdn fails, it will load your local copy.

gertef · on May 1, 2013

Is html5boilerplate a "competitor" to Bootstrap?

Is there a good compendium of modern popular toolkits?

mrharrison · on May 1, 2013

Also, you shouldn't be serving your content from a cdn anyway. All of your js files should be compressed into one file, browser cached, and gzipped.

dangrossman · on May 1, 2013

Making the file smaller doesn't reduce latency. The point of a CDN is local distribution, not just load balancing. You also get to share a cache with other sites; if you point to jQuery on Google's CDN, and the visitor has been to any other site using that CDN, they already have the file cached.

nilliams · on May 1, 2013

>> Making the file smaller doesn't reduce latency

No, but I think the parent was making the case for combining everything into a single file (jQuery + app) which has benefits in reducing number of HTTP requests, especially important on mobile for example.

>> The point of a CDN is local distribution, not just load balancing.

Personally, I build web-apps for UK customers, and host in the UK, so this is a non-issue. I suspect the same is true for a lot of people building complex web-apps (i.e. apps complex enough that you should care about your build process).

>> You also get to share a cache with other sites; if you point to jQuery on Google's CDN, and the visitor has been to any other site using that CDN, they already have the file cached.

Not really true, they have to have hit another site that has uses that exact version of jQuery in order to have it cached. There was a study done recently that illustrated this was very unlikely. I wish I could link to it, but all I can tell you is Alex Sexton referenced it on the Shoptalk podcast [1].

Edit: Another commenter has now referenced the survey in question [2].

[1] http://shoptalkshow.com/episodes/061-with-alex-sexton/

[2] http://www.stevesouders.com/blog/2013/03/18/http-archive-jqu...

acdha · on May 1, 2013

This advice breaks if you have more than one type of page on your site: you don't want to have browsers processing tons of JS which isn't needed for the current page, especially on mobile. Even if it's always cached (not even remotely true), there's a non-trivial amount of parser overhead and memory usage.

What I prefer to do is have a common bundle for the JS used on every page, bundles for each distinct type of page, and separate polyfills for old browsers. This improves your initial cold-cache load time for every page, avoids penalizing new browsers because other people use antique browsers and avoids cache churn by invalidating the entire bundle every time you change anything in one JS file.

ceejayoz · on May 1, 2013

> All of your js files should be compressed into one file, browser cached, and gzipped.

And then, served via CDN.

3825 · on May 1, 2013

Will we see (a few very) popular frameworks like jQuery built into web browsers with the server just declaring what version to use? (I have a feeling that, although I have good intentions, this is a bad idea.) Thoughts?

emehrkay · on May 1, 2013

I hope not. Especially at the pace that these things change (just about every jQuery release is followed up by a point update a day later)

3825 · on May 1, 2013

I'd like to believe Chrome* users are usually up to date on version numbers. We could update the library cache on a different schedule from browser updates. Finally, we could fall back to a CDN (with further fall back to your own server?) if the server requests jQuery 2.0.1 and the browser says sorry, best I can do is 2.0.0.

My fear was more on the server-side. Can we accomplish this without further butchering the head from the standard? I can't see all browsers adopting this or it becoming a standard without a major sponsor. And like someone else mentioned, we can get about the same benefits from aggressive caching. I agree for the most part.

Don't get me wrong. I love Google. I love what they do to make the web faster with their hosted libraries[1]. Correct me if I am wrong but caching the libraries from Google only helps if the the server specifies they want that particular file from Google (I'd imagine it would be a gaping security hole any other way). My thought is whether it is possible to just declare something like 1.9.1.min.jQuery.com and have the browser just recognize it and say "Oh yes I have that. No need for a server round trip. You're welcome." or "No, I don't understand what you're talking about. Give me an address so I can fetch it."

Is it even worth it? jQuery 1.9.1 minified is ~90 kB, so we're probably just trying to shave off tens of milliseconds at the most. I bet we all have fruits hanging lower than this to worry about it. Another thing is that it will probably have to be a vendor-specific meta tag (as I don't see everyone getting aboard this, if anyone) in the header which I don't know is a good thing.

[1] https://developers.google.com/speed/libraries/

*I believe Mozilla Firefox has also started silent updates. It'd be nice to see how quickly users update when a new build gets pushed out.

maggit · on May 1, 2013

You could do something like:

  <script
    src="my-copy-of-jquery.1.9.2.js.min"
    cannonical-uri="https://jquery.com/jquery.1.9.2.js.min" 
    hash="SHA1-blablabla"
  ></script>

With this setup:

1. Old browsers could work as before

2. New browsers could download your resource and cache it with the cannonical-uri and calculated (not declared!) hash as cache key (no dependency on third party CDNs)

3. New browsers could serve this resource from cache if it had downloaded it before with matching cannonical-uri AND hash, disregarding src and host

The hash would make sure that the jquery the user downloaded from whereever would indeed be the same jquery you are serving up.

----

Going back to your original idea, browsers could absolutely come with prepopulated caches for such resources, but they might as well fill these caches on demand.

The important thing in both cases would be to allow shared caching between sites without forcing everybody to agree on which CDN is the most pleasing. Notice that the cannonical-uri is only a name, it is not supposed to be dereferenced.

3825 · on May 1, 2013

That looks beautiful. So the hash basically says "I trust this source"?

>>The important thing in both cases would be to allow shared caching between sites without forcing everybody to agree on which CDN is the most pleasing. Notice that the cannonical-uri is only a name, it is not supposed to be dereferenced.

Yes, you put it much better than I could have. Thank you!

gertef · on May 1, 2013

No, the hash is how you verify that the source (or the transmission) hasn't corrupted the content.

3825 · on May 2, 2013

It is a fingerprint of the file then, right?

ChrisLTD · on May 1, 2013

I've wanted this for a while, although I suppose that aggressive caching achieves roughly the same result.

gwgarry · on May 1, 2013

I personally think that building a list of common frameworks like that and storing them on the browser with something like an md5 hash that developers can then use to include the framework would be very useful.

Like 4ea5b90bdb6f54a9b050cfd8dd19083d.js

jrochkind1 · on May 1, 2013

The document.write method makes it impossible to do async script loading, that you ordinarily could do here to improve perceived page load time. No?

I mean, for instance, you couldn't load that FIRST CDN jquery as async, because you need the browser to block on it so your NEXT script tag (which also can't be loaded async, naturally, cause it has a document.write in it) can check to see if it was loaded.

xkcdfanboy · on May 2, 2013

Yes, that first method is hideous. Async is a necessary speedup and the `if (jQuery) ` slaughters that optimization.

esalman · on May 1, 2013

This is built into Bootstrap.

mac1175 · on May 1, 2013

I have seen this in the HTML5 Boilerplate code in line 27-28 https://github.com/h5bp/html5-boilerplate/blob/master/index....

wubbfindel · on May 1, 2013

FYI, You can link directly to lines in github: https://github.com/h5bp/html5-boilerplate/blob/master/index....

waffle_ss · on May 1, 2013

Also, I would avoid linking to 'master' as it's a moving target (in the future such a link could point to completely different code or even a file that doesn't exist). I try to link to the actual commit that master is pointing to at the time: https://github.com/h5bp/html5-boilerplate/blob/7a22a33d4041c...

oneeyedpigeon · on May 2, 2013

You can also link to line ranges, which is relevant in this case: https://github.com/h5bp/html5-boilerplate/blob/7a22a33d4041c...

mac1175 · on May 3, 2013

Thanks! This is much helpful.

jrochkind1 · on May 1, 2013

What are you talking about? How? What? Are you sure you mean 'bootstrap'?

Maybe it's just some feature of bootstrap I don't know about, but I'm not even sure what to go looking for because I'm not sure what you're suggesting is built into bootstrap. Built into the Javascript parts of Bootstrap somehow? What is, exactly?

esalman · on May 4, 2013

I always get my Bootstrap built from here- http://www.initializr.com/ , and they have it built in. So, I have not been correct, yes.

feralmoan · on May 1, 2013

How/Where? I've had exactly this problem and have had to shim (requirejs) bootstrap with jquery cdn + fallback paths... ? It certainly doesn't resolve its own dependencies under AMD

kmfrk · on May 1, 2013

The main reason you should do this is not so much as a CDN fall-back, but to prevent users from downloading redundant files retrieved from other sites.

Also remember to always use the https URL for the assets, whenever able.

imjared · on May 1, 2013

Is there a reason to use https over a protocol-relative url? My go-to is <script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>

garfij · on May 1, 2013

No, protocol-relative is the way to go.

sp332 · on May 1, 2013

I think HTTPS isn't a bad idea. You don't want some MITM messing with your 3rd-party JS.

chrisbolt · on May 1, 2013

If you're loading your 3rd-party JS from an HTTP page, they already can.

nicholaides · on May 1, 2013

Sure, but it's one less avenue for a MITM attack.

tszming · on May 1, 2013

Not directly affecting jQuery/JavaScript, but when you use protocol relative URL all over the places, you need to be careful as Internet Explorer 7 & 8 will download stylesheets twice if the http(s) protocol is missing: http://www.stevesouders.com/blog/2010/02/10/5a-missing-schem...

pyre · on May 1, 2013

Some older versions of IE don't like protocol-relative, IIRC. Depends on your target platform, and customer-base.

Encosia · on May 1, 2013

I've tested the protocol-relative URL in every version of IE that was available on Browsershots about a year ago (which went back even further than IE6, IIRC). None of them had trouble with it.

pyre · on May 2, 2013

I remember suggesting it to someone else at my previous employer and there were issues. It may have been Outlook[1] though. I can't recall off-hand.

[1] http://stackoverflow.com/questions/4303633/preventing-secure...

LocalPCGuy · on May 1, 2013

To clarify, IE6 and before. If you're target market doesn't include IE6, protocol-relative URLs work just fine.

dmbass · on May 1, 2013

Isn't using already cached assets one of the reasons to use a CDN for popular scripts?

LocalPCGuy · on May 1, 2013

Except that the chance of the asset you are requesting being cached is pretty minimal. See https://github.com/h5bp/html5-boilerplate/pull/1327 for links and discussion about this. The conclusion seemed to be that is it likely that a CDN does provide some benefit, but more from a localization of content and offloading the bandwidth, rather than a speed or cache hit benefit.

EwanToo · on May 1, 2013

I might be missing something, but that link really only talks about the potential benefit of a file already being cached locally on the end-user's device.

There's still a significant benefit in just storing the file closer to the end-user, rather than in a single central location, as the CDN is likely to have both lower latency and have higher bandwidth than the source website.

Now, if your file was so unique and rarely accessed that it wouldn't even be cached on the CDN, then I'd agree with those findings.

jjoergensen · on May 1, 2013

Simple stuff should be simple

gwgarry · on May 1, 2013

I have always thought that all the stuff in common CDNs should be available in local storage by default. Common stuff like jQuery and the like. Firefox should have this as a feature where it downloads those scripts once and stores them in local storage. That way you're not leaking privacy everywhere you go to Google et. al.

addandsubtract · on May 1, 2013

That would be ideal, but currently local storage doesn't work across domains.