Package assetslib
[hide private]

Package assetslib

source code

Cache-friendly asset management via content-hash-naming.

assetslib is a simple Python library that uses content-hash-naming to do URL fingerprinting. For some good background information on URL fingerprinting and cache optimization, see:

http://code.google.com/speed/page-speed/docs/caching.html

Introduction

The assetslib.Assets class does content-hash-naming (aka URL fingerprinting) so you can use aggressive caching headers without risking that a client might have an out-of-date version of an asset in its cache. If the content changes, the content-hash-name also changes, resulting in a different URL for each bytewise-unique version of an asset.

Assets are registered using a human-readable key unique to the asset, by which the content-hash-name of the current asset version can be retrieved. For example, say we start off with an initial version of an asset like this:

>>> from assetslib import Assets
>>> a = Assets()
>>> a.register(data='I have a colon full of cookie', key='ihacfoc.txt')
'ihacfoc.txt'
>>> a['ihacfoc.txt']  # Lookup the current content-hash-name
'7cec3d7646a7fdb742bd854734d42841d12d3932.txt'

And now say we register a slightly updated version of the same asset:

>>> a.register(data='I have a colon full of cookie!', key='ihacfoc.txt')
'ihacfoc.txt'
>>> a['ihacfoc.txt']
'0d7be6f8dd97ca463cef0727f9ad82003a9da656.txt'

Notice how appending the single '!' character to the asset content dramatically changed the content-hash-name.

The extension part of the key is always preserved on the content-hash-name (this is important so that your web-server can still deliver the correct Content-Type header). Multi-part extensions can also be used, for example:

>>> a.register(data='<My pre-gzipped CSS>', key='site.css.gz')
'site.css.gz'
>>> a['site.css.gz']
'3071f65dae784df1fa10361041a30e280c464b01.css.gz'

You can also use a key with no extension, in which case the content-hash-name will likewise have no extension. For example:

>>> a.register(data='<The GNU GPLv3>', key='COPYING')
'COPYING'
>>> a['COPYING']
'76f6ffdf4ad8ff7bc7dff5c5f5ea81be5cac1dd4'

At page generation time, your application should typically use Assets.asset_url() to retrieve a full URL rather than just the content-hash-name. For example:

>>> a.asset_url('ihacfoc.txt')
'/_/0d7be6f8dd97ca463cef0727f9ad82003a9da656.txt'
>>> a.asset_url('site.css.gz')
'/_/3071f65dae784df1fa10361041a30e280c464b01.css.gz'

When you create an Assets instance, the first optional argument is the URL from which your assets are served, which defaults to '/_/'. You can mount your assets at a different URL like this:

>>> a = Assets('/static/')
>>> a.url
'/static/'
>>> a.register(data='I have a colon full of cookie', key='ihacfoc.txt')
'ihacfoc.txt'
>>> a.asset_url('ihacfoc.txt')
'/static/7cec3d7646a7fdb742bd854734d42841d12d3932.txt'

The second optional argument when you create your Assets instance is the directory where your assets are stored. If you don't specify the directory, a random secure temporary directory is created for you. For example:

>>> a = Assets()
>>> a.dir  # doctest:+ELLIPSIS
'/tmp/assets...'

This temporary directory is created simply to make testing and experimentation easier, and is definitely not what you should use in a production deployment nor if you're developing under a multi-process server (as each process would have a different assets directory). In a production deployment, consider using something like dir='/var/cache/assets'.

See the assetslib.Assets class for its full API.

Configuring your web-server

Your web-server should return an Expires header with a date approximately one year in the future from the time at which the request is served, which according to RFC 2616 indicates to the client that the resource never expires:

To mark a response as "never expires," an origin server sends an Expires date approximately one year from the time the response is sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future.

—RFC 2616, section 14.21

To correctly configure Apache, you'll need to enable mod_expires and then insert a configuration snippet something like this:

Alias /_/ "/var/cache/assets/"
<Directory "/var/cache/assets/">
    ExpiresActive On
    ExpiresDefault A31536000
</Directory>

For more information on this Apache configuration, see:

http://httpd.apache.org/docs/2.2/mod/mod_expires.html

For your convenience, the assetslib.wsgi module includes a simple WSGI application for serving your assets: assetslib.wsgi.AssetsApp. This is primarily intended for easy in-tree testing and development. In a production setting, you're much better off serving your static assets directly from your web-server.

The assetslib.wsgi module also includes the assetslib.wsgi.never_expires() function, which will generate an Expires header tuple ready to be appended to the WSGI response_headers. For example, the Expires header for a request made at Unix time 1234567890 would look like this:

>>> from assetslib.wsgi import never_expires
>>> never_expires(1234567890)
('Expires', 'Sat, 13 Feb 2010 23:31:30 GMT')

Implementation details

assetslib was designed with specific goals in mind:

The simple solution to all the above is to use symlinks to store the key to content-hash-name mapping. For example:

>>> import os
>>> from os import path
>>> from assetslib import Assets
>>> a = Assets()
>>> sorted(os.listdir(a.dir))
[]
>>> a.register(data='<My pre-gzipped CSS>', key='site.css.gz')
'site.css.gz'
>>> a['site.css.gz']
'3071f65dae784df1fa10361041a30e280c464b01.css.gz'
>>> sorted(os.listdir(a.dir))
['3071f65dae784df1fa10361041a30e280c464b01.css.gz', 'site.css.gz']

The key is a relative symlink pointing to the content-hash-named file:

>>> os.readlink(path.join(a.dir, 'site.css.gz'))
'3071f65dae784df1fa10361041a30e280c464b01.css.gz'
>>> path.isfile(path.join(a.dir, '3071f65dae784df1fa10361041a30e280c464b01.css.gz'))
True

By default Assets uses a SHA1 hash to generate the content-hash-name. For example:

>>> try:
...     from hashlib import sha1
... except ImportError:
...     from sha import new as sha1  # Python 2.4 compatibility
...
>>> content = open(a.asset_file('site.css.gz'), 'rb').read()
>>> sha1(content).hexdigest()
'3071f65dae784df1fa10361041a30e280c464b01'
>>> a.asset_file('site.css.gz')  # doctest:+ELLIPSIS
'/tmp/assets.../3071f65dae784df1fa10361041a30e280c464b01.css.gz'

Looking forward

It is the author's humble hope that this simple standard become widely adopted, that *nix distros use '/var/cache/assets' as a system-wide assets location, and that faster websites subsequently flourish.


Version: 0.1.1

Submodules [hide private]

Classes [hide private]
  Assets
Provides cache-friendly asset management via content-hash-naming.
Variables [hide private]
  __package__ = 'assetslib'