July 19, 2004
Version 0.6 (Changes)
Proposed by Bryan Ford
Abstract
Many applications create and manage directories
containing cached information about content stored elsewhere,
such as cached Web content
or thumbnail-size versions of images or movies.
For speed and storage efficiency
we would often like to avoid backing up, archiving,
or otherwise unnecessarily copying such directories around,
but it is a pain to identify and individually exclude each such directory
during data transfer operations.
I propose an extremely simple convention
by which applications can reliably “tag” any cache directories they create,
for easy identification by backup systems and other data management utilities.
Data management utilities can then heed or ignore these tags
as the user sees fit.
The Problem
Many application programs create and manage cache directories
of some kind:
directories in which they store temporary information
whose storage can benefit the application’s performance,
but which can easily be regenerated from other “primary” data sources
if it is lost.
For example,
most web browsers have a cache directory
in which they store downloaded Internet content
(Mozilla
typically uses a directory called Cache
somewhere under $(HOME)/.mozilla
).
When backing up, synchronizing, or transferring a user’s home directory
from one machine to another,
we would usually like to avoid saving or transferring cache directories at all,
because:
- The contents of a cache directory
generally has no long-term archival value,
since it just mirrors source material stored elsewhere
and often in a reduced or incomplete form –
e.g., only the parts of a web page that the user has visited recently,
or thumbnail-sized versions of pictures the user has browsed. - Cache directories can grow to a fairly substantial size –
especially cache directories for web browsers –
potentially wasting considerable backup storage space
and/or network bandwidth. - The contents of a cache directory tends to change frequently
even when the user is merely browsing or viewing files,
which confuses incremental backup or synchronization software
into seeing many frequent “false changes” in the directory tree
and defeating the efficiency of the incremental algorithm. - The files in a cache directory
are often named very cryptically
(e.g., based on a hash of the name or content
of the original source material).
Seeing thousands of such meaningless filenames scroll past
during operations that traverse a user’s home directory
is annoying at best,
and may confuse or frighten inexperienced users.
There are a great many applications that use cache directories,
however.
While some evolving standards attempt to standardize the locations of such directories,
most applications create their own cache directories in unpredictable locations
and manage them independently of other applications.
It therefore becomes very tedious,
even for experienced users who can easily recognize cache directories,
to figure out and list all of the appropriate --exclude
options
for backups or other directory traversal operations.
Some applications,
such as recent versions of KDE,
keep their cache directories in /tmp
or /var/tmp
,
which at least partly solves this problem
by getting the cached content out of the user’s home directory entirely.
But there are good reasons an application or user
might not want these caches in /tmp
or /var/tmp
:
for example,
those directories may be on a different partition with limited space,
or files in them may be cleared out by the system too frequently
(e.g., after a day, or on every reboot)
to allow the cache to serve its intended purpose effectively.
The Filesystem Hierarchy Standard
specifies a system-wide /var/cache
directory
intended to hold such cached information,
but this directory is not generally world-writable,
and thus can only be used by applications
pre-packaged with the system or installed by the system administrator.
The XDG Base Directory Specification
instead recommends that applications create their caches
in a location specified by a user environment variable,
or in a standard subdirectory of the user’s home directory if that environment variable doesn’t exist.
The XDG convention allows applications to operate without special permissions,
but it does not help systemwide backup or data management utilities
locate individual users’ cache directories reliably,
because each user might set the environment variable differently in his or her login profile.
Proposed Solution
Given the present non-existence of any perfect agreement
on where applications should store their cached information,
I propose a very simple convention
that will at least allow such information to be identified effectively.
Regardless of where the application decides to (or is configured to)
place its cache directory,
it should place within this directory a file named:
CACHEDIR.T