| Home | Trees | Indices | Help |
|
|---|
|
|
Cache-friendly asset management via content-hash-naming.
assetslib is a simple Python library that uses content-hash-naming to do URL fingerprinting. For some good background information on URL fingerprinting and cache optimization, see:
http://code.google.com/speed/page-speed/docs/caching.html
The assetslib.Assets class does content-hash-naming (aka URL fingerprinting) so you can use aggressive caching headers without risking that a client might have an out-of-date version of an asset in its cache. If the content changes, the content-hash-name also changes, resulting in a different URL for each bytewise-unique version of an asset.
Assets are registered using a human-readable key unique to the asset, by which the content-hash-name of the current asset version can be retrieved. For example, say we start off with an initial version of an asset like this:
>>> from assetslib import Assets >>> a = Assets() >>> a.register(data='I have a colon full of cookie', key='ihacfoc.txt') 'ihacfoc.txt' >>> a['ihacfoc.txt'] # Lookup the current content-hash-name '7cec3d7646a7fdb742bd854734d42841d12d3932.txt'
And now say we register a slightly updated version of the same asset:
>>> a.register(data='I have a colon full of cookie!', key='ihacfoc.txt') 'ihacfoc.txt' >>> a['ihacfoc.txt'] '0d7be6f8dd97ca463cef0727f9ad82003a9da656.txt'
Notice how appending the single '!' character to the asset content dramatically changed the content-hash-name.
The extension part of the key is always preserved on the content-hash-name (this is important so that your web-server can still deliver the correct Content-Type header). Multi-part extensions can also be used, for example:
>>> a.register(data='<My pre-gzipped CSS>', key='site.css.gz') 'site.css.gz' >>> a['site.css.gz'] '3071f65dae784df1fa10361041a30e280c464b01.css.gz'
You can also use a key with no extension, in which case the content-hash-name will likewise have no extension. For example:
>>> a.register(data='<The GNU GPLv3>', key='COPYING') 'COPYING' >>> a['COPYING'] '76f6ffdf4ad8ff7bc7dff5c5f5ea81be5cac1dd4'
At page generation time, your application should typically use Assets.asset_url() to retrieve a full URL rather than just the content-hash-name. For example:
>>> a.asset_url('ihacfoc.txt') '/_/0d7be6f8dd97ca463cef0727f9ad82003a9da656.txt' >>> a.asset_url('site.css.gz') '/_/3071f65dae784df1fa10361041a30e280c464b01.css.gz'
When you create an Assets instance, the first optional argument is the URL from which your assets are served, which defaults to '/_/'. You can mount your assets at a different URL like this:
>>> a = Assets('/static/') >>> a.url '/static/' >>> a.register(data='I have a colon full of cookie', key='ihacfoc.txt') 'ihacfoc.txt' >>> a.asset_url('ihacfoc.txt') '/static/7cec3d7646a7fdb742bd854734d42841d12d3932.txt'
The second optional argument when you create your Assets instance is the directory where your assets are stored. If you don't specify the directory, a random secure temporary directory is created for you. For example:
>>> a = Assets() >>> a.dir # doctest:+ELLIPSIS '/tmp/assets...'
This temporary directory is created simply to make testing and experimentation easier, and is definitely not what you should use in a production deployment nor if you're developing under a multi-process server (as each process would have a different assets directory). In a production deployment, consider using something like dir='/var/cache/assets'.
See the assetslib.Assets class for its full API.
Your web-server should return an Expires header with a date approximately one year in the future from the time at which the request is served, which according to RFC 2616 indicates to the client that the resource never expires:
To mark a response as "never expires," an origin server sends an Expires date approximately one year from the time the response is sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future.
—RFC 2616, section 14.21
To correctly configure Apache, you'll need to enable mod_expires and then insert a configuration snippet something like this:
Alias /_/ "/var/cache/assets/"
<Directory "/var/cache/assets/">
ExpiresActive On
ExpiresDefault A31536000
</Directory>
For more information on this Apache configuration, see:
http://httpd.apache.org/docs/2.2/mod/mod_expires.html
For your convenience, the assetslib.wsgi module includes a simple WSGI application for serving your assets: assetslib.wsgi.AssetsApp. This is primarily intended for easy in-tree testing and development. In a production setting, you're much better off serving your static assets directly from your web-server.
The assetslib.wsgi module also includes the assetslib.wsgi.never_expires() function, which will generate an Expires header tuple ready to be appended to the WSGI response_headers. For example, the Expires header for a request made at Unix time 1234567890 would look like this:
>>> from assetslib.wsgi import never_expires >>> never_expires(1234567890) ('Expires', 'Sat, 13 Feb 2010 23:31:30 GMT')
assetslib was designed with specific goals in mind:
The simple solution to all the above is to use symlinks to store the key to content-hash-name mapping. For example:
>>> import os >>> from os import path >>> from assetslib import Assets >>> a = Assets() >>> sorted(os.listdir(a.dir)) [] >>> a.register(data='<My pre-gzipped CSS>', key='site.css.gz') 'site.css.gz' >>> a['site.css.gz'] '3071f65dae784df1fa10361041a30e280c464b01.css.gz' >>> sorted(os.listdir(a.dir)) ['3071f65dae784df1fa10361041a30e280c464b01.css.gz', 'site.css.gz']
The key is a relative symlink pointing to the content-hash-named file:
>>> os.readlink(path.join(a.dir, 'site.css.gz')) '3071f65dae784df1fa10361041a30e280c464b01.css.gz' >>> path.isfile(path.join(a.dir, '3071f65dae784df1fa10361041a30e280c464b01.css.gz')) True
By default Assets uses a SHA1 hash to generate the content-hash-name. For example:
>>> try: ... from hashlib import sha1 ... except ImportError: ... from sha import new as sha1 # Python 2.4 compatibility ... >>> content = open(a.asset_file('site.css.gz'), 'rb').read() >>> sha1(content).hexdigest() '3071f65dae784df1fa10361041a30e280c464b01' >>> a.asset_file('site.css.gz') # doctest:+ELLIPSIS '/tmp/assets.../3071f65dae784df1fa10361041a30e280c464b01.css.gz'
It is the author's humble hope that this simple standard become widely adopted, that *nix distros use '/var/cache/assets' as a system-wide assets location, and that faster websites subsequently flourish.
Version: 0.1.1
|
|||
| |||
|
|||
|
Assets Provides cache-friendly asset management via content-hash-naming. |
|||
|
|||
__package__ =
|
|||
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Wed Sep 30 19:39:05 2009 | http://epydoc.sourceforge.net |