I still find the idea of CDNS repugnant. No matter how you slice it, you rely on an external resource for important parts of your application. "What it it goes down?" is one question. But you also should be asking yourself about what will happen if it gets hacked. There are also user privacy issues, which get completely overlooked in the chase for shaving off several milliseconds off request time.
A much better architecture would be to serve JavaScript from your server by default, but allow for distributed content-based caching. For example, your script tag could look like this
The hash would be calculated based on the content of the file. The browser then could fetch it from whatever source it wants. Users could cache stuff locally (across websites), without needing to dial into a CDN every time. You could even use a torrent-like network for distributed delivery of popular script libraries.
It's not just a few milliseconds though. For example at https://starthq.com, we are based in Finland, but host on Amazon in US East. A round trip to the US is 200ms+ whereas with CloudFront it's 8ms. Before we used a CDN our page took a few seconds to load - now it takes around 200ms.
I should also mention that all this happens only on first load. We embed etags in the URLs and use far off cache control expires dates, so subsequent page loads get the JS and CSS from the browser's cache.
I think there's confusion here about the use of the term CDN. There are public CDNs, like Google AJAX APIs, that allow a shared copy of an open-source library to be downloaded from a known-good location. This enables users to reuse the same copy their browser has already cached across multiple pages, but like romaniv and the OP have pointed out, you are then trusting Google to be good stewards of that resource.
Conversely, you control what shows up on your own private CDN, like CloudFront. Sure, there may be downside outside of your control, but nobody is going to be able to alter the resources there without your permission.
> Conversely, you control what shows up on your own
> private CDN, like CloudFront. Sure, there may be
> downside outside of your control, but nobody is going
> to be able to alter the resources there without your
> permission.
Well, CloudFront could, since they control the machines that your users are connecting to.
I don't want to imply that you personally shouldn't use a CDN, but the page you linked to loads 43 files. If you consolidated, removed links to or inlined some of them, the difference with and without CDN would likely to be much smaller.
Actually the core StartHQ app is less than 10 files: the libraries and application each have their own JS and CSS files, then there's Font Awesome and a couple of images loaded by Bootstrap. The rest is one third party analytics JS file & iframes for social media sharing buttons, which don't block page rendering and we don't have control over.
Isn't the concern of downtime and hacking only relevant if you have reason to believe that they are more likely to happen with the CDN than with your own servers?
If you host some stuff on server 1 and some stuff on server 2, and you need both to function, then you have two points of failures.
reply
This is kind of a simple argument, increasing the number of "points of failure" doesn't, or shouldn't increase your odds of "failing" more. Adding cache layers and CDNs may add "servers" to your architecture but should also be done in way to reduce overall downtime.
Increasing the number of points of failure absolutely does increase the chances of failure, even if each 'point' is more reliable. 1 server with a 98% uptime is more reliable than 5 servers with a 99% uptime, if all 5 have to be working for everything to 'work'.
While I mostly agree with what you're saying... The truth is, why worry about Google getting hacked and having the attacker modify their CDN version of jQuery (or leaking user behaviour and identity), when I'm already letting them load unknown (to me) code in ga.js?
To me, there's a trade-off I've chosen, and while I'm not 100% comfortable with having the availability of my site depend on Google - "repugnant" is _way_ to strong a word to describe the downside to the pragmatic choice I've made.
that's a great idea. the src should be the cdn though, but the browser should download the file into something like local storage and make it available for the future.
No, if you want to protect the user's privacy, the source has to be your site, otherwise you're giving the CDN info about your customer. With the hash mechanism, 500 sites could share the same library, but only the site the user is visiting can ever know the user visited that site. Sure, one of those 500 gets the hit to performance on initial cache load, but averaged over all visitors to all sites, that's probably a comfortable trade.
I think it would take Mozilla or Apple to push this. Google probably has too much skin in the CDN-info-gathering game.
This only works if the CDN actually returns 4xx or 5xx codes. This still won't work if the CDN is getting DDOS'ed, as in taking forever to return anything.
Along the same lines, Chrome (maybe just webkit generally) does not fire DOMContentLoaded until external script requests have resolved or timed out, even if they are async.
Also, I don't understand why people feel so strongly about reposting this "document.write" method everywhere, which I found in some cases made my page disappear at load time. You can do the same thing using regular DOM methods, and you get more control over the process.
Besides, in bigger projects jquery is only a small part of the whole js stack. I've been using these jquery cdns for years, now I just pack it together with everything else uglified. There are so many jquery versions in use around there right now I don't feel I gain anything by caring about the chance that this particular 30kb will be cached by some percentage of users.
I personally decided to use a local file after all.
Pros:
+ I can cache it for a very long time, so all my returning visitors don't have re-download it. I was very surprised to see that CDN's jquery had a very short 'Expire' headers
+ If my server is up and users can open a web page - there's a very high chance that the .js file will load as well.
+ I can combine different jquery libraries/plugins into one file, so my page can load MUCH faster
Cons:
- It might load a little more slowly, because it's not on CDN.
The Google CDN serves jQuery with a 365 day max-age as long as you reference a specific version (which you should be doing anyway). It only uses the shorter cache expiration, necessarily, if you want a "latest version" reference. More info here: http://encosia.com/the-crucial-0-in-google-cdn-references-to...
I think the notion of "fallbacks" (and the sibling comment's "timeouts") are extremely specific to the web. However, there is some prior art on this subject in the python world:
try:
import simplejson as json
except ImportError:
import json
So it might be worth contacting the committee and expressing that.
Based on comments above, it might also be helpful to configure a timeout of some type. So if, for example, the CDN is suffering from a DDOS or other latency inducing issue, you can force a fallback to your own server.
If you guys take a look at html5 boilerplate http://html5boilerplate.com/ . It has redundancy built in, so if the cdn fails, it will load your local copy.
Making the file smaller doesn't reduce latency. The point of a CDN is local distribution, not just load balancing. You also get to share a cache with other sites; if you point to jQuery on Google's CDN, and the visitor has been to any other site using that CDN, they already have the file cached.
No, but I think the parent was making the case for combining everything into a single file (jQuery + app) which has benefits in reducing number of HTTP requests, especially important on mobile for example.
>> The point of a CDN is local distribution, not just load balancing.
Personally, I build web-apps for UK customers, and host in the UK, so this is a non-issue. I suspect the same is true for a lot of people building complex web-apps (i.e. apps complex enough that you should care about your build process).
>> You also get to share a cache with other sites; if you point to jQuery on Google's CDN, and the visitor has been to any other site using that CDN, they already have the file cached.
Not really true, they have to have hit another site that has uses that exact version of jQuery in order to have it cached. There was a study done recently that illustrated this was very unlikely. I wish I could link to it, but all I can tell you is Alex Sexton referenced it on the Shoptalk podcast [1].
Edit: Another commenter has now referenced the survey in question [2].
This advice breaks if you have more than one type of page on your site: you don't want to have browsers processing tons of JS which isn't needed for the current page, especially on mobile. Even if it's always cached (not even remotely true), there's a non-trivial amount of parser overhead and memory usage.
What I prefer to do is have a common bundle for the JS used on every page, bundles for each distinct type of page, and separate polyfills for old browsers. This improves your initial cold-cache load time for every page, avoids penalizing new browsers because other people use antique browsers and avoids cache churn by invalidating the entire bundle every time you change anything in one JS file.
Will we see (a few very) popular frameworks like jQuery built into web browsers with the server just declaring what version to use? (I have a feeling that, although I have good intentions, this is a bad idea.) Thoughts?
I'd like to believe Chrome* users are usually up to date on version numbers. We could update the library cache on a different schedule from browser updates. Finally, we could fall back to a CDN (with further fall back to your own server?) if the server requests jQuery 2.0.1 and the browser says sorry, best I can do is 2.0.0.
My fear was more on the server-side. Can we accomplish this without further butchering the head from the standard? I can't see all browsers adopting this or it becoming a standard without a major sponsor. And like someone else mentioned, we can get about the same benefits from aggressive caching. I agree for the most part.
Don't get me wrong. I love Google. I love what they do to make the web faster with their hosted libraries[1]. Correct me if I am wrong but caching the libraries from Google only helps if the the server specifies they want that particular file from Google (I'd imagine it would be a gaping security hole any other way). My thought is whether it is possible to just declare something like 1.9.1.min.jQuery.com and have the browser just recognize it and say "Oh yes I have that. No need for a server round trip. You're welcome." or "No, I don't understand what you're talking about. Give me an address so I can fetch it."
Is it even worth it? jQuery 1.9.1 minified is ~90 kB, so we're probably just trying to shave off tens of milliseconds at the most. I bet we all have fruits hanging lower than this to worry about it. Another thing is that it will probably have to be a vendor-specific meta tag (as I don't see everyone getting aboard this, if anyone) in the header which I don't know is a good thing.
2. New browsers could download your resource and cache it with the cannonical-uri and calculated (not declared!) hash as cache key (no dependency on third party CDNs)
3. New browsers could serve this resource from cache if it had downloaded it before with matching cannonical-uri AND hash, disregarding src and host
The hash would make sure that the jquery the user downloaded from whereever would indeed be the same jquery you are serving up.
----
Going back to your original idea, browsers could absolutely come with prepopulated caches for such resources, but they might as well fill these caches on demand.
The important thing in both cases would be to allow shared caching between sites without forcing everybody to agree on which CDN is the most pleasing. Notice that the cannonical-uri is only a name, it is not supposed to be dereferenced.
That looks beautiful. So the hash basically says "I trust this source"?
>>The important thing in both cases would be to allow shared caching between sites without forcing everybody to agree on which CDN is the most pleasing. Notice that the cannonical-uri is only a name, it is not supposed to be dereferenced.
Yes, you put it much better than I could have. Thank you!
I personally think that building a list of common frameworks like that and storing them on the browser with something like an md5 hash that developers can then use to include the framework would be very useful.
The document.write method makes it impossible to do async script loading, that you ordinarily could do here to improve perceived page load time. No?
I mean, for instance, you couldn't load that FIRST CDN jquery as async, because you need the browser to block on it so your NEXT script tag (which also can't be loaded async, naturally, cause it has a document.write in it) can check to see if it was loaded.
Also, I would avoid linking to 'master' as it's a moving target (in the future such a link could point to completely different code or even a file that doesn't exist). I try to link to the actual commit that master is pointing to at the time: https://github.com/h5bp/html5-boilerplate/blob/7a22a33d4041c...
What are you talking about? How? What? Are you sure you mean 'bootstrap'?
Maybe it's just some feature of bootstrap I don't know about, but I'm not even sure what to go looking for because I'm not sure what you're suggesting is built into bootstrap. Built into the Javascript parts of Bootstrap somehow? What is, exactly?
How/Where? I've had exactly this problem and have had to shim (requirejs) bootstrap with jquery cdn + fallback paths... ? It certainly doesn't resolve its own dependencies under AMD
The main reason you should do this is not so much as a CDN fall-back, but to prevent users from downloading redundant files retrieved from other sites.
Also remember to always use the https URL for the assets, whenever able.
Is there a reason to use https over a protocol-relative url? My go-to is <script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
Not directly affecting jQuery/JavaScript, but when you use protocol relative URL all over the places, you need to be careful as Internet Explorer 7 & 8 will download stylesheets twice if the http(s) protocol is missing: http://www.stevesouders.com/blog/2010/02/10/5a-missing-schem...
I've tested the protocol-relative URL in every version of IE that was available on Browsershots about a year ago (which went back even further than IE6, IIRC). None of them had trouble with it.
Except that the chance of the asset you are requesting being cached is pretty minimal. See https://github.com/h5bp/html5-boilerplate/pull/1327 for links and discussion about this. The conclusion seemed to be that is it likely that a CDN does provide some benefit, but more from a localization of content and offloading the bandwidth, rather than a speed or cache hit benefit.
I might be missing something, but that link really only talks about the potential benefit of a file already being cached locally on the end-user's device.
There's still a significant benefit in just storing the file closer to the end-user, rather than in a single central location, as the CDN is likely to have both lower latency and have higher bandwidth than the source website.
Now, if your file was so unique and rarely accessed that it wouldn't even be cached on the CDN, then I'd agree with those findings.
I have always thought that all the stuff in common CDNs should be available in local storage by default. Common stuff like jQuery and the like. Firefox should have this as a feature where it downloads those scripts once and stores them in local storage. That way you're not leaking privacy everywhere you go to Google et. al.
A much better architecture would be to serve JavaScript from your server by default, but allow for distributed content-based caching. For example, your script tag could look like this
<script src="some.js" hash="ha3ee938h8eh38a9h49ha094h" />
The hash would be calculated based on the content of the file. The browser then could fetch it from whatever source it wants. Users could cache stuff locally (across websites), without needing to dial into a CDN every time. You could even use a torrent-like network for distributed delivery of popular script libraries.