Man cache
From LSWiki
Revision as of 14:15, 11 June 2007 (edit) Laine (Talk | contribs) ← Previous diff |
Current revision (14:18, 11 June 2007) (edit) Laine (Talk | contribs) (→Description) |
||
Line 11: | Line 11: | ||
implemented by cache descriptors include: | implemented by cache descriptors include: | ||
- | * Entries are timestamped as to when they entered the cache, and hit | + | * Entries are timestamped as to when they entered the cache, and hit counts are maintained. |
- | counts are maintained. | + | * Every two to six minutes, the cache is examined and any entries which are older than five minutes and whose current hit count is less than 2 are dropped from the cache. |
- | * Every two to six minutes, the cache is examined and any entries which | + | * Every reset(), hit counts in the cache are reduced as follows: the hit count is capped at 100,000, and then one-tenth of its current level is subtracted. |
- | are older than five minutes and whose current hit count is less than 2 | + | * Where available, the eval cost and (if enabled at the system level) execution time of a cache miss is stored, and is used to estimate the eval and time savings created by the cache. |
- | are dropped from the cache. | + | * Information is kept on the total number of cases where the cache could and could not be used (cacheable and uncacheable information). |
- | * Every reset(), hit counts in the cache are reduced as follows: the hit | + | * Optionally, the cache may be defined as persistent, so that its information will be kept between reboots. |
- | count is capped at 100,000, and then one-tenth of its current level is | + | |
- | subtracted. | + | |
- | * Where available, the eval cost and (if enabled at the system level) | + | |
- | execution time of a cache miss is stored, and is used to estimate the | + | |
- | eval and time savings created by the cache. | + | |
- | * Information is kept on the total number of cases where the cache could | + | |
- | and could not be used (cacheable and uncacheable information). | + | |
- | * Optionally, the cache may be defined as persistent, so that its | + | |
- | information will be kept between reboots. | + | |
Considering these characteristics, you would want to use cache descriptors | Considering these characteristics, you would want to use cache descriptors | ||
when these conditions apply: | when these conditions apply: | ||
- | * You are managing a pool of cached values which is potentially so large | + | * You are managing a pool of cached values which is potentially so large that is is not a good idea to keep all of them around. |
- | that is is not a good idea to keep all of them around. | + | * In this pool, some of your values will be used repeatedly and others will not, and you cannot reliably tell which ones are which beforehand. |
- | * In this pool, some of your values will be used repeatedly and others | + | * It is not necessary for all cacheable values which use the same key to be the same. That is, if one value goes into the cache with a given key, then drops out, then later the key is requested again and another version of the value is created, nothing will break. This is made note |
- | will not, and you cannot reliably tell which ones are which beforehand. | + | of because, while use of cache descriptors will result in cached pointers often being the same across the system, it is important to remember that you cannot rely on this always being the case. |
- | * It is not necessary for all cacheable values which use the same key to | + | |
- | be the same. That is, if one value goes into the cache with a given | + | |
- | key, then drops out, then later the key is requested again and another | + | |
- | version of the value is created, nothing will break. This is made note | + | |
- | of because, while use of cache descriptors will result in cached | + | |
- | pointers often being the same across the system, it is important to | + | |
- | remember that you cannot rely on this always being the case. | + | |
There are two different potential purposes to using such a cache, either | There are two different potential purposes to using such a cache, either |
Current revision
Contents |
Files
/def/descriptor/cache.c /lib/descriptors/cache.h
Description
Cache descriptors implement a particular style of managed cache. Often you can implement simple caches using only mappings; cache descriptors are for situations where that kind of basic mechanism is not sufficient. Behaviors implemented by cache descriptors include:
- Entries are timestamped as to when they entered the cache, and hit counts are maintained.
- Every two to six minutes, the cache is examined and any entries which are older than five minutes and whose current hit count is less than 2 are dropped from the cache.
- Every reset(), hit counts in the cache are reduced as follows: the hit count is capped at 100,000, and then one-tenth of its current level is subtracted.
- Where available, the eval cost and (if enabled at the system level) execution time of a cache miss is stored, and is used to estimate the eval and time savings created by the cache.
- Information is kept on the total number of cases where the cache could and could not be used (cacheable and uncacheable information).
- Optionally, the cache may be defined as persistent, so that its information will be kept between reboots.
Considering these characteristics, you would want to use cache descriptors when these conditions apply:
- You are managing a pool of cached values which is potentially so large that is is not a good idea to keep all of them around.
- In this pool, some of your values will be used repeatedly and others will not, and you cannot reliably tell which ones are which beforehand.
- It is not necessary for all cacheable values which use the same key to be the same. That is, if one value goes into the cache with a given key, then drops out, then later the key is requested again and another version of the value is created, nothing will break. This is made note
of because, while use of cache descriptors will result in cached pointers often being the same across the system, it is important to remember that you cannot rely on this always being the case.
There are two different potential purposes to using such a cache, either or both of which may be served by a given implementation. The first is to save computational expense; the second is to save memory by sharing pointers to complex data structures. An example of the former is the text caches maintained by the terminal types in /def/term (cache implemented in /std/def/term.c). Examples of the latter are the belonging descriptor and shape descriptor caches (implemented in /def/descriptor/belonging.c and /daemon/geometry.c, respecively). Generally, only caches intended to save computational expense benefit from being made persistent.
Cache descriptors have only one public field, which can be used to specify them as persistent; their usage is generally very simple and is handled through their support functions. Usually, you simply create your cache with a statement like 'descriptor cache = Cache();' and use it with a statement like 'mixed value = Cache_Use(cache, key, #'compute_something);'.
At this writing, cache descriptors also cease adding items to their cache when it reaches the system maximum array size, presently 50000. This is necessary because in the present driver version, foreach() on a mapping breaks if the mapping is larger than the maximum array size. Hopefully this will be fixed in the future.
As execution time profiling meaningfully increases the cost of cache operation, it can be configured on or off for all caches using the macro Cache_Time_Profile near the top of /def/descriptor/cache.c. When the module is compiled with this macro off, execution times will not be recorded or reported for caches.
Public Fields
Cache_Persistence: Used to request that the cache be persistent. The value provided is a string value which will uniquely identify the cache within your object. Best practice is to use the form __FILE__ + "::some_name", which ensures sufficient uniqueness, assuming you do not reuse the 'some_name' portion. Persistence is implemented via the persistence daemon, which means among other things that the stored information will be dropped if it is not requested over the space of a month; so if you rename or cease using your cache, you do not need to worry about it hanging around forever.
Cache_Flags: A bitmask field used for flags. Currently this amounts to:
Cache_Flag_Refcount_Drop: The cache should manage itself based on the refcount of data structures in the cache rather than based on its own hit count. This flag is currently unimplemented, but is reserved for future implementation.
Internal Fields
Cache_Data: The mapping constituting the cache's lookup table.
Cache_Dropped: A counter for the number of entries the cache descriptor has dropped over its lifetime.
Cache_Last_Check: A timestamp indicating when the cache descriptor was last examined to see if it needed entries flushed.
Cache_Last_Flush: A timestamp indicating when the cache descriptor last had entries flushed.
Cache_Size_Before: The cache's size before its last flush.
Cache_Size_After: The cache's size after its last flush.
Cache_Cacheable: The total number of times where the cache might have been used where it actually could be used.
Cache_Uncacheable: The total number of times where the cache might have been used where it could not be used (due to the cache growing too large, the condition for a Cache_Conditional() call not being fulfilled, and so on).
Support Functions
varargs mixed Cache_Use(descriptor dxr, mixed key, mixed value, mixed args...) The generally recommended interface for operational use of the cache. The key indicated is looked up in the cache; if present, the associated value is returned and its hit count is incremented. If not, the value for the key is set to 'value' (which, if a closure, will be called with up to ten arguments following it), it will be marked as entering the cache at the present time, its hit count will be started at 1, and 'value' will be returned. The exception to this behavior is the limit on the cache size noted above; if the cache is too large, the value will not be stored, and Cache_Uncacheable will be incremented. Otherwise, Cache_Cacheable will be incremented.
Example :
string out = Cache_Use(cache, text, #'process_text, text);
string Cache_Report(descriptor dxr) Returns a general statistical report on the cache. Information includes the current size of the cache, the time it was last checked for flushable entries, the time and cache sizes from its last flush, if any, the number of entries dropped in its history, its cacheability numbers, and the mean, minimum, and maximum of its entries' ages, hit counts, eval and (if configured) time cache-miss costs (if available), memory sizes (if arrays or mappings), estimated eval and time savings (if available), and estimated memory savings (if appropriate).
Note that estimated memory savings, in particular, are very approximate and are based on several assumptions: 1) that every cache hit represents a data structure that would have stayed in memory until now if not for the cache, 2) that the size of a data structure is, for an array, four bytes times the array size, and for a mapping, four bytes times the mapping size times one plus the mapping width (meaning, among other things, that the memory used by strings is ignored, which we pretty much have to do because of the driver-level string sharing, and that additional sub-structures within the structure are ignored). Therefore the estimated memory savings will almost always be an overestimate, and will be a worse overestimate the longer the cache operates. Eval and time savings are likewise based on the assumption that the eval and time cost of the cache miss would have been constant.
Example :
who->display(Cache_Report(cache));
varargs mixed Cache_Conditional(descriptor dxr, mixed key, mixed value, mixed args...)
A special-purpose variation of Cache_Use. It behaves identically except that the first argument to the value closure is a reference to a status variable that, if set true, will prevent the value from being stored in the cache. Excessive cache size will also prevent the cache from being used. Cache_Cacheable or Cache_Uncacheable will be incremented, as appropriate.
mixed Cache_Retrieve(descriptor dxr, mixed key)
Performs a cache lookup only. If 'key' is present in the cache, its hit counter is incremented, Cache_Cacheable is incremented, and the key's associated value is returned. Otherwise, 0 is returned.
void Cache_Store(descriptor dxr, mixed key, mixed value)
Sets a cache value. The associated value for 'key' in the cache is set to 'value', it is marked as entering the cache at the present time, its hit count is set to 1, and Cache_Cacheable is incremented.
status Cache_Contains(descriptor dxr, mixed key)
Returns True if the cache contain the key specified. Does not trigger a cache hit or Cache_Cacheable increment.
void Cache_Clear(descriptor dxr)
Clears the cache specified and resets all of its tracking information.
void Cache_Terminate(descriptor dxr)
Used to notify the system that a cache will be passing out of usage. It is unfortunately necessary to call this function if you drop a cache other than through the object being destructed, otherwise the cache will hang around in memory indefinitely.
void Cache_Register(descriptor dxr)
Registers a cache in the system's tracking. This is only needed in fairly arcane circumstances; to be explicit, when a cache is being passed around between an object and its successors via reload storage.