persidict package#
Persistent dictionaries that store key-value pairs on local disks or AWS S3.
This package provides a unified interface for persistent dictionary-like storage with various backends including filesystem and AWS S3.
Classes:
- PersiDict: Abstract base class defining the unified interface for all
persistent dictionaries.
- NonEmptySafeStrTuple: A flat tuple of URL/filename-safe strings that
can be used as a key for PersiDict objects.
- FileDirDict: A dictionary that stores key-value pairs as files on a
local hard drive. Keys compose filenames, values are stored as pickle or JSON objects.
BasicS3Dict: A basic S3-backed dictionary with direct S3 operations.
- WriteOnceDict: A write-once wrapper that prevents modification of existing
items after initial storage.
- EmptyDict: Equivalent of null device in OS - accepts all writes but discards
them, returns nothing on reads. Always appears empty regardless of operations performed. Useful for testing, debugging, or as a placeholder.
OverlappingMultiDict: A dictionary that can handle overlapping key spaces.
Functions:
- get_safe_chars(): Returns a set of URL/filename-safe characters permitted
in keys.
- replace_unsafe_chars(): Replaces forbidden characters in a string with
safe alternatives.
Constants:
KEEP_CURRENT, DELETE_CURRENT: Special joker values for conditional operations. ANY_ETAG, ETAG_IS_THE_SAME, ETAG_HAS_CHANGED: Condition flags for
ETag-based conditional operations.
ITEM_NOT_AVAILABLE: Sentinel for absent keys. VALUE_NOT_RETRIEVED: Sentinel for skipped value retrieval.
Note
All persistent dictionaries support multiple serialization formats, including pickle and JSON, with automatic type handling and collision-safe key encoding.
- class persidict.AlwaysRetrieveFlag(*args, **kwargs)[source]#
Bases:
RetrieveValueFlagAlways retrieve the value in conditional operations.
- class persidict.AnyETagFlag(*args, **kwargs)[source]#
Bases:
ETagConditionFlagCondition that is always satisfied regardless of etag values.
- class persidict.AppendOnlyDictCached(*, main_dict: PersiDict[ValueType], data_cache: PersiDict[ValueType])[source]#
Bases:
PersiDict[ValueType]Append-only PersiDict facade with a read-through cache.
This adapter composes two concrete PersiDict instances and presents them as a single append-only mapping. It trusts the cache because both backends are append-only: once a key is written it will never be modified or deleted.
Behavior summary:
Reads: __getitem__ first tries the cache, falls back to the main dict, then populates the cache on a miss.
Membership: __contains__ returns True immediately if the key is in the cache; otherwise it checks the main dict.
Writes: __setitem__ writes to the main dict and then mirrors the value into the cache (after base validation performed by PersiDict).
set_item_if: delegates the write to the main dict, mirrors the value into the cache on success.
Deletion: not supported (append-only), will raise TypeError.
Iteration/length/timestamps: delegated to the main dict.
- _main#
The authoritative append-only PersiDict instance.
- _data_cache#
The append-only PersiDict used purely as a value cache.
- Parameters:
main_dict – The authoritative append-only PersiDict.
data_cache – A PersiDict used as a cache; must be append-only and compatible with main_dict (same base_class_for_values and serialization_format).
- Raises:
TypeError – If main_dict or data_cache are not PersiDict instances.
ValueError – If either dict is not append-only or their base_class_for_values differ.
- discard_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag) ConditionalOperationResult[ValueType][source]#
Deletion is not supported for append-only dictionaries.
- etag(key: NonEmptySafeStrTuple | Sequence[str] | str) ETagValue[source]#
Return the ETag from the main dict.
Delegating to the main dict preserves backend-specific ETag semantics (e.g., native S3 ETags) instead of deriving ETags from timestamps.
- get_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Return value only if its ETag satisfies a condition; cache on success.
- get_params() dict[str, Any][source]#
Return constructor parameters for this instance.
- Returns:
A dictionary with keys ‘main_dict’ and ‘data_cache’, sorted by keys.
- get_subdict(prefix_key: SafeStrTuple | Sequence[str] | str) AppendOnlyDictCached[ValueType][source]#
Get a sub-dictionary for the given key prefix.
Returns a new AppendOnlyDictCached with main_dict and data_cache both scoped to the given prefix.
- Parameters:
prefix_key – Prefix key (string or sequence of strings) identifying the subdictionary scope.
- Returns:
- A new cached dictionary rooted at the
specified prefix.
- set_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, value: ValueType | Joker, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Append-only: delegates to main dict; caches a returned value when available.
- setdefault_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, default_value: ValueType, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Insert default if absent and condition satisfied; delegate to main dict.
- timestamp(key: NonEmptySafeStrTuple | Sequence[str] | str) float[source]#
Return item’s timestamp from the main dict.
- Parameters:
key – Dictionary key (string or sequence of strings) or NonEmptySafeStrTuple.
- Returns:
POSIX timestamp of the last write for the key.
- Raises:
KeyError – If the key does not exist in the main dict.
- transform_item(key: NonEmptySafeStrTuple | Sequence[str] | str, *, transformer: TransformingFunction[ValueType], n_retries: int | None = 6) OperationResult[ValueType][source]#
Not supported for append-only dictionaries.
- exception persidict.BackendError(message: str, *, backend: str, operation: str, key: Any = None)[source]#
Bases:
RuntimeErrorA backend/infrastructure condition prevents completion.
Not a missing-key condition — those are
KeyError. Must be raised with exception chaining (raise BackendError(...) from exc).- Parameters:
message – Human-readable description of the failure.
backend – Name of the backend (e.g.
"filesystem","s3").operation – Name of the operation that failed (e.g.
"init","put_object").key – The key involved, or
Noneif not applicable.
- backend#
Name of the backend.
- operation#
Name of the failed operation.
- key#
The key involved, or
None.
- class persidict.BasicS3Dict(*, bucket_name: str = 'my_bucket', region: str = None, root_prefix: str = '', serialization_format: str = 'pkl', append_only: bool = False, base_class_for_values: type | None = None)[source]#
Bases:
PersiDict[ValueType]A persistent dictionary that stores key-value pairs as S3 objects.
Each key-value pair is stored as a separate S3 object in the specified bucket.
A key can be either a string (object name without file extension) or a sequence of strings representing a hierarchical path (folder structure ending with an object name). Values can be instances of any Python type and are serialized to S3 objects.
BasicS3Dict supports multiple serialization formats: - Binary storage using pickle (‘pkl’ format) - Human-readable text using jsonpickle (‘json’ format) - Plain text for string values (other formats)
Note
Unlike native Python dictionaries, insertion order is not preserved. Operations may incur S3 API costs and network latency. All operations are performed directly against S3 without local caching.
- property base_url: str | None#
Return the S3 URL prefix of this dictionary.
This property is not part of the standard Python dictionary interface.
- Returns:
//<bucket>/<root_prefix>”.
- Return type:
The base S3 URL in the format “s3
- bucket_name: str#
- discard_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag) ConditionalOperationResult[ValueType][source]#
Discard a key only if an ETag condition is satisfied.
Uses S3 conditional delete with IfMatch to guard against concurrent changes for all condition types.
- Parameters:
key – Dictionary key.
condition – ANY_ETAG, ETAG_IS_THE_SAME, or ETAG_HAS_CHANGED.
expected_etag – The caller’s expected ETag, or ITEM_NOT_AVAILABLE.
- Returns:
ConditionalOperationResult with the outcome.
- etag(key: NonEmptySafeStrTuple | Sequence[str] | str) ETagValue[source]#
Get an ETag for a key.
- Parameters:
key – Dictionary key (string or sequence of strings or NonEmptySafeStrTuple).
- Returns:
The ETag value for the S3 object.
- Raises:
KeyError – If the key does not exist in S3.
- get_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Retrieve the value for a key only if an ETag condition is satisfied.
Uses S3 conditional headers (IfMatch/IfNoneMatch) for server-side condition checking when possible.
- Parameters:
key – Dictionary key.
condition – ANY_ETAG, ETAG_IS_THE_SAME, or ETAG_HAS_CHANGED.
expected_etag – The caller’s expected ETag, or ITEM_NOT_AVAILABLE.
retrieve_value – Controls value retrieval. IF_ETAG_CHANGED (default) uses S3 IfNoneMatch to skip the fetch when the ETag matches. ALWAYS_RETRIEVE always fetches the value. NEVER_RETRIEVE does a HEAD only and returns VALUE_NOT_RETRIEVED.
- Returns:
ConditionalOperationResult with the outcome.
- Raises:
TypeError – If base_class_for_values is set and the retrieved value does not match it.
- get_params() dict[str, Any][source]#
Return configuration parameters as a dictionary.
This method supports the Parameterizable API and is not part of the standard Python dictionary interface.
- Returns:
A mapping of parameter names to their configured values, including S3-specific parameters (region, bucket_name, root_prefix) sorted by key names.
- get_subdict(prefix_key: SafeStrTuple | Sequence[str] | str) BasicS3Dict[ValueType][source]#
Create a subdictionary scoped to items with the specified prefix.
Returns an empty subdictionary if no items exist under the prefix. If the prefix is empty, the entire dictionary is returned. This method is not part of the standard Python dictionary interface.
- Parameters:
prefix_key – A common prefix (string or sequence of strings or SafeStrTuple) used to scope items stored under this dictionary.
- Returns:
- A new BasicS3Dict instance with root_prefix
extended by the given prefix_key, sharing the parent’s bucket, region, serialization_format, and other configuration settings.
- region: str#
- root_prefix: str#
- set_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, value: ValueType | Joker, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Store a value only if an ETag condition is satisfied.
Uses S3 conditional headers (IfMatch / IfNoneMatch) for a single-roundtrip put when the condition can be fully expressed without a prior HEAD. This covers ETAG_IS_THE_SAME (any expected_etag) and ETAG_HAS_CHANGED with a real expected_etag. Other combinations fall back to check-then-write.
- Parameters:
key – Dictionary key.
value – Value to store.
condition – ANY_ETAG, ETAG_IS_THE_SAME, or ETAG_HAS_CHANGED.
expected_etag – The caller’s expected ETag, or ITEM_NOT_AVAILABLE.
retrieve_value – Controls whether the existing value is fetched. Applies both when the condition is not satisfied and when KEEP_CURRENT is used with a satisfied condition. IF_ETAG_CHANGED (default) fetches only if expected_etag != actual_etag. ALWAYS_RETRIEVE fetches the existing value. NEVER_RETRIEVE returns VALUE_NOT_RETRIEVED.
- Returns:
ConditionalOperationResult with the outcome.
- setdefault(key: NonEmptySafeStrTuple | Sequence[str] | str, default: ValueType | None = None) ValueType[source]#
Insert key with default value if absent; return the current value.
Uses an S3 conditional put (If-None-Match:
*) to avoid overwriting existing values under concurrent writers. On conditional failure, returns the current value without modifying it.- Parameters:
key – Key (string, sequence of strings, or SafeStrTuple).
default – Value to insert if the key is not present. Defaults to None.
- Returns:
Existing value if key is present; otherwise the provided default value.
- Raises:
TypeError – If default is a Joker command (KEEP_CURRENT/DELETE_CURRENT), or if the key is missing and default violates value type constraints.
ConcurrencyConflictError – If retries are exhausted due to concurrent modifications.
- setdefault_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, default_value: ValueType, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Insert default_value if key is absent; conditioned on ETag check.
Uses S3 conditional put (IfNoneMatch:
*) for atomic insert-if-absent when the key is absent and the condition is satisfied, avoiding the TOCTOU race in the base class.- Parameters:
key – Dictionary key.
default_value – Value to insert if the key is absent and the condition is satisfied.
condition – ANY_ETAG, ETAG_IS_THE_SAME, or ETAG_HAS_CHANGED.
expected_etag – The caller’s expected ETag, or ITEM_NOT_AVAILABLE if the caller believes the key is absent.
retrieve_value – Controls value retrieval when the key exists. IF_ETAG_CHANGED (default) fetches only if expected_etag != actual_etag. ALWAYS_RETRIEVE fetches the existing value. NEVER_RETRIEVE returns VALUE_NOT_RETRIEVED.
- Returns:
ConditionalOperationResult with the outcome of the operation.
- timestamp(key: NonEmptySafeStrTuple | Sequence[str] | str) float[source]#
Get the last modification timestamp for a key.
This method is not part of the standard Python dictionary interface.
- Parameters:
key – Dictionary key (string or sequence of strings
NonEmptySafeStrTuple). (or)
- Returns:
POSIX timestamp (seconds since Unix epoch) of the last modification time as reported by S3. The timestamp is timezone-aware and converted to UTC.
- Raises:
KeyError – If the key does not exist in S3.
- exception persidict.ConcurrencyConflictError(key: Any, attempts: int)[source]#
Bases:
RuntimeErrorAn operation failed after exhausting retries due to concurrent modification.
Carries structured context for programmatic access.
- Parameters:
key – The key on which the conflict occurred.
attempts – Total number of attempts made before giving up.
- key#
The key on which the conflict occurred.
- attempts#
Total number of attempts made before giving up.
- class persidict.ConditionalOperationResult(condition_was_satisfied: bool, actual_etag: ETagValue | ItemNotAvailableFlag, resulting_etag: ETagValue | ItemNotAvailableFlag, new_value: ValueType | ItemNotAvailableFlag | ValueNotRetrievedFlag)[source]#
Bases:
Generic[ValueType]Result of a conditional operation guarded by an ETag check.
- condition_was_satisfied#
Whether the ETag condition was met.
- Type:
bool
- actual_etag#
ETag of the key before the operation, or ITEM_NOT_AVAILABLE if the key was absent.
- Type:
persidict.jokers_and_status_flags.ETagValue | persidict.jokers_and_status_flags.ItemNotAvailableFlag
- resulting_etag#
ETag after the operation, or ITEM_NOT_AVAILABLE if the key is absent.
- Type:
persidict.jokers_and_status_flags.ETagValue | persidict.jokers_and_status_flags.ItemNotAvailableFlag
- new_value#
The value after the operation. May be ITEM_NOT_AVAILABLE (key absent) or VALUE_NOT_RETRIEVED (value fetch was skipped).
- Type:
persidict.jokers_and_status_flags.ValueType | persidict.jokers_and_status_flags.ItemNotAvailableFlag | persidict.jokers_and_status_flags.ValueNotRetrievedFlag
- actual_etag: ETagValue | ItemNotAvailableFlag#
- condition_was_satisfied: bool#
- new_value: ValueType | ItemNotAvailableFlag | ValueNotRetrievedFlag#
- resulting_etag: ETagValue | ItemNotAvailableFlag#
- property value_was_mutated: bool#
Whether the operation changed the stored value.
- class persidict.DeleteCurrentFlag(*args, **kwargs)[source]#
Bases:
JokerFlag instructing PersiDict to delete the current value for a key.
- Usage:
Assign this flag instead of a real value to remove the key if it exists. If the key is absent, implementations will typically no-op.
Examples
>>> d[key] = DELETE_CURRENT
Note
This is a singleton class; constructing it repeatedly returns the same instance.
- class persidict.ETagConditionFlag(*args, **kwargs)[source]#
Bases:
SingletonMixinBase class for ETag condition selectors.
- class persidict.ETagHasChangedFlag(*args, **kwargs)[source]#
Bases:
ETagConditionFlagCondition requiring expected and actual etags to differ.
- class persidict.ETagIsTheSameFlag(*args, **kwargs)[source]#
Bases:
ETagConditionFlagCondition requiring expected and actual etags to match.
- class persidict.EmptyDict(*, append_only: bool = False, base_class_for_values: type | None = None, serialization_format: str = 'pkl')[source]#
Bases:
PersiDict[ValueType]An equivalent of the null device in OS - accepts all writes but discards them, returns nothing on reads. Always appears empty regardless of operations performed on it.
This class is useful for testing, debugging, or as a placeholder when you want to disable persistent storage without changing the interface.
Key characteristics: - All write operations are accepted, but data is discarded - All read operations behave as if the dict is empty - Length is always 0 - Iteration always yields no results - Subdict operations return new EmptyDict instances - All timestamp operations raise KeyError (no data exists)
Performance note: If validation is not needed, consider overriding __setitem__ to simply pass for better performance.
- delete_if_exists(key: NonEmptySafeStrTuple | Sequence[str] | str) bool[source]#
Backward-compatible wrapper for discard().
- discard(key: NonEmptySafeStrTuple | Sequence[str] | str) bool[source]#
Always returns False as the key never exists.
- discard_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag) ConditionalOperationResult[ValueType][source]#
Key is always absent; condition evaluated normally.
- get(key: NonEmptySafeStrTuple | Sequence[str] | str, default: ValueType | None = None) ValueType | None[source]#
Always returns the default value since key is never found.
- get_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Key is always absent; condition evaluated with actual_etag=ITEM_NOT_AVAILABLE.
- get_subdict(prefix_key: SafeStrTuple | Sequence[str] | str) EmptyDict[ValueType][source]#
Returns a new EmptyDict as subdictionary.
- Parameters:
prefix_key – Key prefix (ignored, as EmptyDict has no hierarchical structure).
- Returns:
A new EmptyDict instance with the same configuration.
- random_key() NonEmptySafeStrTuple | None[source]#
Returns None as EmptyDict contains no keys.
- set_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, value: ValueType | Joker, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Key is always absent; condition evaluated, write discarded on success.
- setdefault(key: NonEmptySafeStrTuple | Sequence[str] | str, default: ValueType | None = None) ValueType | None[source]#
Always returns the default value without storing it.
- setdefault_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, default_value: ValueType, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Key is always absent; condition evaluated, write discarded on success.
- timestamp(key: NonEmptySafeStrTuple | Sequence[str] | str) float[source]#
Always raises KeyError as EmptyDict contains nothing.
- transform_item(key: NonEmptySafeStrTuple | Sequence[str] | str, *, transformer: TransformingFunction[ValueType], n_retries: int | None = 6) OperationResult[ValueType][source]#
No-op: returns ITEM_NOT_AVAILABLE without calling the transformer.
- class persidict.FileDirDict(*, base_dir: str = '__file_dir_dict__', serialization_format: str = 'pkl', append_only: bool = False, digest_len: int = 4, base_class_for_values: type | None = None)[source]#
Bases:
PersiDict[ValueType]A persistent dict that stores key-value pairs in local files.
A new file is created for each key-value pair. A key is either a filename (without an extension), or a sequence of directory names that ends with a filename. A value can be any Python object, which is stored in a file. Insertion order is not preserved.
FileDirDict can store objects in binary files or in human-readable text files (either in JSON format or as plain text). By default, a short hash suffix (
digest_len=4) is appended to each key path component to prevent collisions on case-insensitive filesystems.- property base_dir: str#
Return dictionary’s base directory.
This property is absent in the original dict API.
- Returns:
Absolute path to the base directory used by this dictionary.
- clear() None[source]#
Remove all elements from the dictionary.
- Raises:
MutationPolicyError – If append_only is True.
- digest_len: int#
- etag(key: NonEmptySafeStrTuple | Sequence[str] | str) ETagValue[source]#
Return a stable ETag derived from mtime, file size, and inode.
Uses a single stat call and combines st_mtime_ns, st_size, and st_ino. Falls back to a float-based mtime representation if nanosecond precision is not available.
- Raises:
KeyError – If the key does not exist.
- get_params() dict[str, Any][source]#
Return configuration parameters of the dictionary.
This method is needed to support the ParameterizableMixin API and is absent in the standard dict API.
- Returns:
A mapping of parameter names to values including base_dir merged with the base PersiDict parameters.
- get_subdict(prefix_key: SafeStrTuple | Sequence[str] | str) FileDirDict[ValueType][source]#
Get a subdictionary containing items with the same prefix key.
For non-existing prefix key, an empty sub-dictionary is returned. If the prefix is empty, the entire dictionary is returned. This method is absent in the original dict API.
- Parameters:
prefix_key – Prefix key (string or sequence of strings) that identifies the subdirectory.
- Returns:
- A new FileDirDict instance rooted at the specified
subdirectory, sharing the same parameters as this dictionary.
- random_key() NonEmptySafeStrTuple | None[source]#
Return a uniformly random key from the dictionary, or None if empty.
Performs a full directory traversal using reservoir sampling (k=1) to select a random file matching the configured serialization_format without loading all keys into memory.
- Returns:
A random key if any items exist; otherwise None.
- Return type:
NonEmptySafeStrTuple | None
- timestamp(key: NonEmptySafeStrTuple | Sequence[str] | str) float[source]#
Get last modification time (in seconds, Unix epoch time).
This method is absent in the original dict API.
- Parameters:
key – Key whose timestamp to return.
- Returns:
POSIX timestamp of the underlying file.
- Raises:
KeyError – If the key does not exist.
- class persidict.IfETagChangedRetrieveFlag(*args, **kwargs)[source]#
Bases:
RetrieveValueFlagRetrieve the value only if the actual ETag differs from expected.
- class persidict.ItemNotAvailableFlag(*args, **kwargs)[source]#
Bases:
SingletonMixinSentinel indicating that the item is not present in the dict.
Used uniformly for absent keys across all contexts: - As
expected_etag: “I believe the key is absent.” - Asactual_etag: “the key was absent at check time.” - Asresulting_etag: “the key is absent after the operation.” - Asnew_value: “no value to return.” - As transformer input: “transforming from absence (creating new).”Note
This is a singleton class; constructing it repeatedly returns the same instance.
- class persidict.Joker(*args, **kwargs)[source]#
Bases:
SingletonMixinBase class for joker flags.
Subclasses represent value-less commands that alter persistence behavior when assigned to a key.
- class persidict.KeepCurrentFlag(*args, **kwargs)[source]#
Bases:
JokerFlag instructing PersiDict to keep the current value unchanged.
- Usage:
Assign this flag instead of a real value to indicate that an existing value should not be modified.
Examples
>>> d[key] = KEEP_CURRENT
Note
This is a singleton class; constructing it repeatedly returns the same instance.
- class persidict.LocalDict(*, backend: _RAMBackend | None = None, serialization_format: str = 'pkl', append_only: bool = False, base_class_for_values: type | None = None, prune_interval: int | None = 64)[source]#
Bases:
PersiDict[ValueType]In-memory PersiDict backed by a RAM-only hierarchical store.
LocalDict mirrors FileDirDict semantics but keeps all data in process memory using a simple tree structure (RAMBackend). It is useful for tests and ephemeral workloads where durability is not required. Keys are hierarchical sequences of safe strings (SafeStrTuple). Values are stored per serialization_format and tracked with modification timestamps, providing the same API surface as other PersiDict implementations.
- append_only#
If True, items are immutable and cannot be modified or deleted after initial creation.
- Type:
bool
- base_class_for_values#
Optional base class that all stored values must inherit from. If None, any type is accepted (with serialization_format restrictions enforced by the base class).
- Type:
type | None
- serialization_format#
Logical serialization/format label (e.g., “pkl”, “json”) used as a namespace for values and timestamps within the backend.
- Type:
str
- _backend#
The in-memory tree that actually stores data.
Notes
Not thread-safe or process-safe; use external synchronization if accessed concurrently.
Memory-only: all data is lost when the object is garbage-collected or the process exits.
- clear() None[source]#
Remove all items under this serialization_format across the entire tree.
Only entries stored for the current serialization_format are removed; data for other serialization formats remains intact.
- etag(key: NonEmptySafeStrTuple | Sequence[str] | str) ETagValue[source]#
Return a unique ETag for a key based on a monotonic write counter.
Unlike the base class (which formats the timestamp), LocalDict uses an integer counter stored on the backend that increments on every write, guaranteeing a distinct ETag even when two writes occur within the same clock tick or from different LocalDict instances sharing the same backend.
- Parameters:
key – Key (string/sequence or SafeStrTuple).
- Returns:
A unique opaque string identifying the current version.
- Raises:
KeyError – If the key does not exist.
- get_params() dict[str, Any][source]#
Return constructor parameters needed to recreate this instance.
Note that the backend object itself is included as a reference; copying or reconstructing a LocalDict with this parameter will share the same in-memory store.
- Returns:
A dictionary of parameters (sorted by key) suitable for passing to the constructor.
- get_subdict(prefix_key: Iterable[str] | SafeStrTuple) LocalDict[ValueType][source]#
Return a view rooted at the given key prefix.
The returned LocalDict shares the same underlying RAMBackend, but its root is moved to the subtree identified by prefix_key. If intermediate nodes do not exist, they are created (resulting in an empty subdict). Modifications to a sub-dictionary will affect the parent dictionary and any other sub-dictionaries that share the same backend.
- Parameters:
prefix_key – Key prefix identifying the subtree to expose. May be empty to refer to the current root.
- Returns:
- A LocalDict instance whose operations are restricted to
the keys under the specified prefix.
- timestamp(key: NonEmptySafeStrTuple | Sequence[str] | str) float[source]#
Return the last modification time of a key.
- Parameters:
key – Key (string/sequence or SafeStrTuple).
- Returns:
- POSIX timestamp (seconds since Unix epoch) when the value was
last written.
- Raises:
KeyError – If the key does not exist.
- class persidict.MutableDictCached(*, main_dict: PersiDict[ValueType], data_cache: PersiDict[ValueType], etag_cache: PersiDict[ETagValue])[source]#
Bases:
PersiDict[ValueType]PersiDict adapter with read-through caching and ETag validation.
This adapter composes three concrete PersiDict instances: - main_dict: the source of truth that persists data and supports ETags. - data_cache: a PersiDict used purely as a cache for values. - etag_cache: a PersiDict used to cache ETag strings per key.
For reads, the adapter consults etag_cache to decide whether the cached value is still valid. If the ETag hasn’t changed in the main dict, the cached value is returned; otherwise the fresh value and ETag are fetched from main_dict and both caches are updated. All writes and deletions are performed against main_dict and mirrored into caches to keep them in sync.
Notes
main_dict must fully support ETag operations; caches must be mutable (append_only=False).
This class inherits type and serialization settings from main_dict.
- discard_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag) ConditionalOperationResult[ValueType][source]#
Discard item only if ETag satisfies a condition; update caches.
- etag(key: NonEmptySafeStrTuple | Sequence[str] | str) ETagValue[source]#
Return cached ETag if available, otherwise fetch from main dict.
This method returns the ETag from the local cache when available, avoiding a (network) call to the main dict. If the ETag is not cached, it fetches from the main dict and caches the result.
Note: The cached ETag may be stale if the value was modified directly in the main dict (bypassing this wrapper). However, reads via __getitem__ are self-healing and will detect/refresh stale caches.
- Parameters:
key – Non-empty key to query.
- Returns:
The ETag string for the key.
- Raises:
KeyError – If the key does not exist in the main dict.
- get_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Return value only if the ETag satisfies a condition.
Delegates to the main dict and refreshes caches when data is fetched.
- get_params() dict[str, Any][source]#
Return constructor parameters for this instance.
- Returns:
A dictionary with keys ‘main_dict’, ‘data_cache’, and ‘etag_cache’, sorted by keys.
- get_subdict(prefix_key: SafeStrTuple | Sequence[str] | str) MutableDictCached[ValueType][source]#
Get a sub-dictionary for the given key prefix.
Returns a new MutableDictCached with main_dict, data_cache, and etag_cache all scoped to the given prefix.
- Parameters:
prefix_key – Prefix key (string or sequence of strings) identifying the subdictionary scope.
- Returns:
- A new cached dictionary rooted at the
specified prefix.
- set_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, value: ValueType | Joker, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Set item only if ETag satisfies a condition; update caches when a value is returned.
- setdefault_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, default_value: ValueType, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Insert default if absent and condition satisfied; delegate to main dict.
- timestamp(key: NonEmptySafeStrTuple | Sequence[str] | str) float[source]#
Get the last-modified timestamp from the main dict.
- Parameters:
key – Non-empty key to query.
- Returns:
POSIX timestamp (seconds since epoch) as provided by the main dict.
- transform_item(key: NonEmptySafeStrTuple | Sequence[str] | str, *, transformer: TransformingFunction[ValueType], n_retries: int | None = 6) OperationResult[ValueType][source]#
Apply a transformation; delegate to main dict and update caches.
- exception persidict.MutationPolicyError(policy: str)[source]#
Bases:
TypeErrorThe dict’s mutation policy forbids the attempted mutation.
Messages name the policy (e.g.
"append-only","write-once"), not the operation.- Parameters:
policy – Name of the policy that rejected the mutation.
- policy#
Name of the policy that rejected the mutation.
- class persidict.NeverRetrieveFlag(*args, **kwargs)[source]#
Bases:
RetrieveValueFlagNever retrieve the value; always return VALUE_NOT_RETRIEVED.
- class persidict.NonEmptySafeStrTuple(*args, **kwargs)[source]#
Bases:
SafeStrTupleA SafeStrTuple that must contain at least one string.
This subclass enforces that the tuple is non-empty.
- class persidict.OperationResult(resulting_etag: ETagValue | ItemNotAvailableFlag, new_value: ValueType | ItemNotAvailableFlag)[source]#
Bases:
Generic[ValueType]Result of an unconditional mutating operation (transform_item).
- resulting_etag#
ETag after the operation, or ITEM_NOT_AVAILABLE if the key is absent.
- Type:
persidict.jokers_and_status_flags.ETagValue | persidict.jokers_and_status_flags.ItemNotAvailableFlag
- new_value#
The value after the operation, or ITEM_NOT_AVAILABLE if the key is absent.
- Type:
persidict.jokers_and_status_flags.ValueType | persidict.jokers_and_status_flags.ItemNotAvailableFlag
- new_value: ValueType | ItemNotAvailableFlag#
- resulting_etag: ETagValue | ItemNotAvailableFlag#
- class persidict.OverlappingMultiDict(*, dict_type: type[PersiDict], shared_subdicts_params: dict[str, Any], **individual_subdicts_params: dict[str, Any])[source]#
Bases:
objectContainer for multiple PersiDict instances, differing only by serialization_format.
This class instantiates several sub-dictionaries (PersiDict subclasses) that share common parameters but differ by their serialization_format. Each sub-dictionary is exposed as an attribute whose name equals the serialization_format (e.g., obj.json, obj.csv). All sub-dictionaries typically point to the same underlying base directory or bucket and differ only in how items are materialized by serialization format.
- dict_type#
A subclass of PersiDict used to create each sub-dictionary.
Parameters applied to every created sub-dictionary (e.g., base_dir, bucket, append_only, digest_len).
- individual_subdicts_params#
Mapping from serialization_format (attribute name) to a dict of parameters that are specific to that sub-dictionary. These override or extend shared_subdicts_params for the given serialization_format.
- subdicts_names#
The list of serialization_format names (i.e., attribute names) created.
- Raises:
TypeError – If pickling is attempted or item access is used on the OverlappingMultiDict itself rather than its sub-dicts.
- class persidict.PersiDict(*, append_only: bool = False, base_class_for_values: type | None = None, serialization_format: str = 'pkl')[source]#
Bases:
MutableMapping[NonEmptySafeStrTuple,ValueType],ParameterizableMixinAbstract dict-like interface for durable key-value stores.
Keys are URL/filename-safe sequences of strings (SafeStrTuple). Concrete subclasses implement storage backends (e.g., filesystem, S3). The API is similar to Python’s dict but does not guarantee insertion order and adds persistence-specific helpers (e.g., timestamp()).
- Attributes (can’t be changed after initialization):
- append_only:
If True, items are immutable and non-removable: existing values cannot be modified or deleted.
- base_class_for_values:
Optional base class that all values must inherit from. If None, any type is accepted.
- serialization_format:
File extension/format for stored values (e.g., “pkl”, “json”).
- property append_only: bool#
Whether the store is append-only.
- Returns:
True if the store is append-only (contains immutable items that cannot be modified or deleted), False otherwise.
- base_class_for_values: type | None#
- clear() None[source]#
Remove all items from the dictionary.
- Raises:
MutationPolicyError – If the dictionary is append-only.
- delete_if_exists(key: NonEmptySafeStrTuple | Sequence[str] | str) bool[source]#
Backward-compatible wrapper for discard().
This method is kept for backward compatibility; new code should use discard(). Behavior is identical to discard().
- discard(key: NonEmptySafeStrTuple | Sequence[str] | str) bool[source]#
Delete an item without raising an exception if it doesn’t exist.
This method is absent in the original dict API.
- Parameters:
key – Key (string or sequence of strings) or SafeStrTuple.
- Returns:
True if the item existed and was deleted; False otherwise.
- Raises:
MutationPolicyError – If the dictionary is append-only.
- discard_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag) ConditionalOperationResult[ValueType][source]#
Discard a key only if an ETag condition is satisfied.
No retrieve_value parameter — new_value is ITEM_NOT_AVAILABLE on delete success or missing key; on condition failure it is VALUE_NOT_RETRIEVED.
Warning
This base class implementation is not atomic. Subclasses that require concurrency safety should override this method.
- Parameters:
key – Dictionary key.
condition – ANY_ETAG, ETAG_IS_THE_SAME, or ETAG_HAS_CHANGED.
expected_etag – The caller’s expected ETag, or ITEM_NOT_AVAILABLE if the caller believes the key is absent.
- Returns:
ConditionalOperationResult with the outcome of the operation.
- etag(key: NonEmptySafeStrTuple | Sequence[str] | str) ETagValue[source]#
Return the ETag of a key.
By default, this returns a stringified timestamp of the last modification time. Subclasses may override to provide true backend-specific ETags (e.g., S3).
This method is absent in the original Python dict API.
- Parameters:
key – Key (string or sequence of strings) or SafeStrTuple.
- Returns:
The ETag for the key.
- Raises:
KeyError – If the key does not exist.
- get_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Retrieve the value for a key only if an ETag condition is satisfied.
If the key is absent, actual_etag is ITEM_NOT_AVAILABLE and the condition is evaluated normally. No KeyError is raised.
Warning
This base class implementation is not atomic. Subclasses that offer concurrency safety should override this method.
- Parameters:
key – Dictionary key.
condition – ANY_ETAG, ETAG_IS_THE_SAME, or ETAG_HAS_CHANGED.
expected_etag – The caller’s expected ETag, or ITEM_NOT_AVAILABLE if the caller believes the key is absent.
retrieve_value – Controls value retrieval. IF_ETAG_CHANGED (default) skips the fetch when expected_etag == actual_etag, returning VALUE_NOT_RETRIEVED instead. ALWAYS_RETRIEVE always fetches the value. NEVER_RETRIEVE always returns VALUE_NOT_RETRIEVED when the key exists.
- Returns:
ConditionalOperationResult with the outcome of the operation.
- get_params() dict[str, Any][source]#
Return configuration parameters of this dictionary.
- Returns:
A sorted dictionary of parameters used to reconstruct the instance. This supports the Parameterizable API and is absent in the built-in dict.
- get_subdict(prefix_key: SafeStrTuple | Sequence[str] | str) Self[source]#
Get a sub-dictionary containing items with the given prefix key.
Items whose keys start with the provided prefix are visible through the returned sub-dictionary. If the prefix does not exist, an empty sub-dictionary is returned. If the prefix is empty, the entire dictionary is returned.
This method is absent in the original Python dict API.
- Parameters:
prefix_key – Key prefix (string, sequence of strings, or SafeStrTuple) identifying the sub-dict to expose.
- Returns:
- A dictionary-like view restricted to keys under the
provided prefix.
- Raises:
NotImplementedError – Must be implemented by subclasses that support hierarchical key spaces.
- get_with_etag(key: NonEmptySafeStrTuple | Sequence[str] | str) ConditionalOperationResult[ValueType][source]#
Retrieve the value and its ETag for a key in a single operation.
Convenience wrapper around get_item_if that fetches the current value and ETag without requiring condition parameters. On backends that support it (e.g., S3), both are obtained in a single network round-trip.
The result uses the same ConditionalOperationResult type as the conditional _if methods, with condition_was_satisfied always True. When the key is absent, new_value and actual_etag are both ITEM_NOT_AVAILABLE.
- Parameters:
key – Dictionary key.
- Returns:
ConditionalOperationResult with the value in new_value and the ETag in actual_etag (and resulting_etag).
- items() Iterator[tuple[NonEmptySafeStrTuple, ValueType]][source]#
Return an iterator over (key, value) pairs.
- Returns:
Items iterator.
- items_and_timestamps() Iterator[tuple[NonEmptySafeStrTuple, ValueType, float]][source]#
Return an iterator over (key, value, timestamp) triples.
- Returns:
Items and timestamps.
- keys() Iterator[NonEmptySafeStrTuple][source]#
Return an iterator over keys.
- Returns:
Keys iterator.
- keys_and_timestamps() Iterator[tuple[NonEmptySafeStrTuple, float]][source]#
Return an iterator over (key, timestamp) pairs.
- Returns:
Keys and POSIX timestamps.
- newest_keys(*, max_n: int | None = None) list[NonEmptySafeStrTuple][source]#
Return up to max_n newest keys in the dictionary.
This method is absent in the original Python dict API.
- Parameters:
max_n – Maximum number of keys to return. If None, return all keys sorted by age (newest first). Values <= 0 yield an empty list. Defaults to None.
- Returns:
The newest keys, newest first.
- newest_values(*, max_n: int | None = None) list[ValueType][source]#
Return up to max_n newest values in the dictionary.
This method is absent in the original Python dict API.
- Parameters:
max_n – Maximum number of values to return. If None, return values for all keys sorted by age (newest first). Values <= 0 yield an empty list.
- Returns:
Values corresponding to the newest keys.
- oldest_keys(*, max_n: int | None = None) list[NonEmptySafeStrTuple][source]#
Return up to max_n oldest keys in the dictionary.
This method is absent in the original Python dict API.
- Parameters:
max_n – Maximum number of keys to return. If None, return all keys sorted by age (oldest first). Values <= 0 yield an empty list. Defaults to None.
- Returns:
The oldest keys, oldest first.
- oldest_values(*, max_n: int | None = None) list[ValueType][source]#
Return up to max_n oldest values in the dictionary.
This method is absent in the original Python dict API.
- Parameters:
max_n – Maximum number of values to return. If None, return values for all keys sorted by age (oldest first). Values <= 0 yield an empty list.
- Returns:
Values corresponding to the oldest keys.
- pop(key: NonEmptySafeStrTuple | Sequence[str] | str, *args: Any) Any[source]#
Remove a key and return its value.
Uses
transform_iteminternally so the read-then-delete sequence is protected by ETag checks and automatic retries, avoiding the TOCTOU race that the inherited MutableMapping.pop would have.- Parameters:
key – Key (string or sequence of strings) or SafeStrTuple.
*args – Optional default value (at most one).
- Returns:
The value that was stored, or the default if the key was absent and a default was provided.
- Raises:
MutationPolicyError – If the dictionary is append-only.
TypeError – If more than one default argument is given.
KeyError – If the key does not exist and no default was given.
- popitem() tuple[NonEmptySafeStrTuple, ValueType][source]#
Remove and return an arbitrary (key, value) pair.
Uses
pop(which delegates totransform_item) so the read-then-delete is protected by ETag checks and automatic retries. If the chosen key is deleted by another process beforepopcompletes, the next key is tried until one succeeds or the dictionary is empty.- Returns:
A (key, value) tuple.
- Raises:
MutationPolicyError – If the dictionary is append-only.
KeyError – If the dictionary is empty.
- random_key() NonEmptySafeStrTuple | None[source]#
Return a random key from the dictionary.
This method is absent in the original Python dict API.
Implementation uses reservoir sampling to select a uniformly random key in streaming time, without loading all keys into memory or using len().
- Returns:
- A random key if the dictionary is
not empty; None if the dictionary is empty.
- Return type:
NonEmptySafeStrTuple | None
- serialization_format: str#
- set_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, value: ValueType | Joker, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Store a value only if an ETag condition is satisfied.
If the key is absent, actual_etag is ITEM_NOT_AVAILABLE and the condition is evaluated normally. No KeyError is raised.
Warning
This base class implementation is not atomic. Subclasses that require concurrency safety should override this method.
- Parameters:
key – Dictionary key.
value – Value to store.
condition – ANY_ETAG, ETAG_IS_THE_SAME, or ETAG_HAS_CHANGED.
expected_etag – The caller’s expected ETag, or ITEM_NOT_AVAILABLE if the caller believes the key is absent.
retrieve_value – Controls whether the existing value is fetched. Applies both when the condition is not satisfied and when KEEP_CURRENT is used with a satisfied condition. IF_ETAG_CHANGED (default) fetches only if expected_etag != actual_etag. ALWAYS_RETRIEVE fetches the existing value. NEVER_RETRIEVE returns VALUE_NOT_RETRIEVED.
- Returns:
ConditionalOperationResult with the outcome of the operation.
- setdefault(key: NonEmptySafeStrTuple | Sequence[str] | str, default: ValueType | None = None) ValueType[source]#
Insert key with default value if absent; return the current value.
Behaves like the built-in dict.setdefault(): if the key exists, return its current value; otherwise, set the key to the default value and return that default.
Warning
This base class implementation is not atomic. Subclasses that require concurrency safety should override this method.
- Parameters:
key – Dictionary key.
default – Value to insert if the key is not present. Defaults to None.
- Returns:
Existing value if key is present; otherwise the provided default value.
- Raises:
TypeError – If default is a Joker command (KEEP_CURRENT/DELETE_CURRENT), or if the key is missing and default violates value type constraints.
- setdefault_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, default_value: ValueType, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Insert default_value if key is absent; conditioned on ETag check.
If the key is absent and the condition is satisfied, default_value is inserted. If the key is present, no mutation occurs regardless of whether the condition is satisfied.
Warning
This base class implementation is not atomic. Subclasses that require concurrency safety should override this method.
- Parameters:
key – Dictionary key.
default_value – Value to insert if the key is absent and the condition is satisfied.
condition – ANY_ETAG, ETAG_IS_THE_SAME, or ETAG_HAS_CHANGED.
expected_etag – The caller’s expected ETag, or ITEM_NOT_AVAILABLE if the caller believes the key is absent.
retrieve_value – Controls value retrieval when the key exists. IF_ETAG_CHANGED (default) fetches only if expected_etag != actual_etag. ALWAYS_RETRIEVE fetches the existing value. NEVER_RETRIEVE returns VALUE_NOT_RETRIEVED.
- Returns:
ConditionalOperationResult with the outcome of the operation.
- subdicts() dict[str, Self][source]#
Return a mapping of first-level keys to sub-dictionaries.
This method is absent in the original dict API.
- Returns:
- A mapping from a top-level key segment to a
sub-dictionary restricted to the corresponding keyspace.
- Return type:
dict[str, Self]
- abstractmethod timestamp(key: NonEmptySafeStrTuple | Sequence[str] | str) float[source]#
Return the last modification time of a key.
This method is absent in the original dict API.
- Parameters:
key – Key (string or sequence of strings) or SafeStrTuple.
- Returns:
- POSIX timestamp (seconds since Unix epoch) of the last
modification of the item.
- Raises:
NotImplementedError – Must be implemented by subclasses.
- transform_item(key: NonEmptySafeStrTuple | Sequence[str] | str, *, transformer: TransformingFunction[ValueType], n_retries: int | None = 6) OperationResult[ValueType][source]#
Apply a transformation function to a key’s value.
Reads the current value (or ITEM_NOT_AVAILABLE if absent), calls transformer(current_value), and writes the result back using conditional operations.
If the transformer returns DELETE_CURRENT, the key is deleted (or no-op if already absent). If the transformer returns KEEP_CURRENT, the value is left unchanged.
Warning
This base class implementation is not atomic unless the backend’s conditional operations are atomic. The transformer may be called multiple times if conflicts occur.
- Parameters:
key – Dictionary key.
transformer – A callable that receives the current value (or ITEM_NOT_AVAILABLE) and returns a new value, DELETE_CURRENT, or KEEP_CURRENT.
n_retries – Number of retries after ETag conflicts. None retries indefinitely.
- Raises:
ConcurrencyConflictError – If conflicts persist after n_retries.
- Returns:
OperationResult with resulting_etag and new_value.
- class persidict.RetrieveValueFlag(*args, **kwargs)[source]#
Bases:
SingletonMixinBase class for value retrieval strategy flags.
Subclasses control whether and when the actual value is fetched in conditional operations.
- persidict.S3Dict#
alias of
S3Dict_FileDirCached
- class persidict.S3Dict_FileDirCached(*, bucket_name: str = 'my_bucket', region: str = None, root_prefix: str = '', base_dir: str = '__s3_dict__', serialization_format: str = 'pkl', digest_len: int = 8, append_only: bool = False, base_class_for_values: type | None = None)[source]#
Bases:
PersiDict[ValueType]S3-backed persistent dictionary using BasicS3Dict with local caching.
This class mimics the interface and behavior of S3Dict_Legacy but internally uses BasicS3Dict for S3 operations combined with FileDirDict-based local caching via the cached wrapper classes (AppendOnlyDictCached/MutableDictCached).
The architecture layers caching on top of BasicS3Dict to provide: - Fast local access for frequently accessed items - Efficient batch operations - ETag-based change detection for mutable dictionaries - Optimized append-only performance when append_only=True
- property base_dir: str#
Get the base directory for local cache.
- property base_url: str#
Get the base S3 URL.
- property digest_len: int#
Get the digest length used for collision prevention.
- discard(key: NonEmptySafeStrTuple | Sequence[str] | str) bool[source]#
Delete an item without raising an exception if it doesn’t exist.
This method fixes the issue where cached dictionaries return multiple success counts for a single key deletion.
- Parameters:
key – Key to delete.
- Returns:
True if the item existed and was deleted; False otherwise.
- discard_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag) ConditionalOperationResult[ValueType][source]#
Discard item only if ETag satisfies a condition; delegate to cached dict.
- etag(key: NonEmptySafeStrTuple | Sequence[str] | str) ETagValue[source]#
Get the ETag for an item.
For mutable dicts, returns the cached S3 native ETag if available, otherwise fetches from S3 and caches it. For append-only dicts, delegates to the cached dict’s etag method.
- Parameters:
key – Non-empty key to query.
- Returns:
The ETag string for the key.
- Raises:
KeyError – If the key does not exist.
- get_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Get item only if ETag satisfies a condition; delegate to cached dict.
- get_subdict(prefix_key: SafeStrTuple | Sequence[str] | str) S3Dict_FileDirCached[ValueType][source]#
Get a subdictionary for the given key prefix.
Returns a new S3Dict_FileDirCached with both the S3 storage and local cache scoped to the given prefix. Modifications to the subdictionary will be visible in the parent and vice versa.
- Parameters:
prefix_key – Prefix key (string or sequence of strings) identifying the subdictionary scope.
- Returns:
- A new cached S3 dictionary rooted at the
specified prefix.
- property root_prefix: str#
Get the S3 root prefix for this dictionary.
- set_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, value: ValueType | Joker, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Set item only if ETag satisfies a condition; delegate to cached dict.
- setdefault_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, default_value: ValueType, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Insert default if absent and condition satisfied; delegate to cached dict.
- timestamp(key: NonEmptySafeStrTuple | Sequence[str] | str) float[source]#
Get the timestamp of when the item was last modified.
- transform_item(key: NonEmptySafeStrTuple | Sequence[str] | str, *, transformer: TransformingFunction[ValueType], n_retries: int | None = 6) OperationResult[ValueType][source]#
Transform item; delegate to cached dict.
- class persidict.SafeStrTuple(*args, **kwargs)[source]#
Bases:
Sequence[str],HashableAn immutable sequence of URL/filename-safe strings.
The sequence is flat (no nested structures) and hashable, making it suitable for use as a dictionary key. All strings are validated to contain only characters from SAFE_CHARS_SET and to have length less than SAFE_STRING_MAX_LENGTH.
- property str_chain: tuple[str, ...]#
Alias for strings for backward compatibility.
- Returns:
The underlying tuple of strings.
- strings: tuple[str, ...]#
- class persidict.TransformingFunction(*args, **kwargs)[source]#
Bases:
Protocol[ValueType]Protocol for transform_item callback functions.
A TransformingFunction receives the current value (or ITEM_NOT_AVAILABLE when the key is absent) and returns a new value, KEEP_CURRENT, or DELETE_CURRENT.
Generic over ValueType so that
transform_itemon aPersiDict[int]expects a transformer whose input and output are both typed in terms ofint.
- class persidict.ValueNotRetrievedFlag(*args, **kwargs)[source]#
Bases:
SingletonMixinSentinel indicating the value exists but was not retrieved.
Returned in
new_valuewhenretrieve_value=NEVER_RETRIEVEor whenretrieve_value=IF_ETAG_CHANGEDand the ETag has not changed.Note
This is a singleton class; constructing it repeatedly returns the same instance.
- class persidict.WriteOnceDict(*, wrapped_dict: PersiDict[ValueType] | None = None, p_consistency_checks: float | None = None)[source]#
Bases:
PersiDict[ValueType]Dictionary wrapper that preserves the first value written for each key.
Subsequent writes to an existing key are allowed but ignored as they are expected to have exactly the same value. They are randomly checked against the original value to ensure consistency. If a randomly triggered check finds a difference, a ValueError is raised. The probability of performing a check is controlled by
p_consistency_checks.This is useful in concurrent or distributed settings where the same value is assumed to be assigned repeatedly to the same key, and you want to check this assumption (detect divergent values) without paying the full cost of always comparing values.
API limitation:
set_item_ifis not supported and raisesMutationPolicyError. Conditional overwrites contradict write-once semantics. Insert-if-absent is available viasetdefault_ifon the wrapped dict (and is used internally by__setitem__).Atomicity note: insert-if-absent semantics in
__setitem__are delegated to the wrapped backend’ssetdefault_if. Atomicity is only guaranteed when the wrapped backend provides it (e.g.BasicS3Dictuses S3 conditional headers). The defaultPersiDictbase implementation is not atomic.- property consistency_checks_attempted: int#
Number of attempted consistency checks.
- Returns:
Attempted checks counter.
- property consistency_checks_failed: int#
Number of failed consistency checks.
- Returns:
Failed checks (attempted - passed).
- property consistency_checks_passed: int#
Number of successful consistency checks.
- Returns:
Passed checks counter.
- get_params() dict[str, Any][source]#
Return parameterization of this instance.
- Returns:
A dictionary with keys ‘wrapped_dict’ and ‘p_consistency_checks’, sorted by keys for deterministic comparison/serialization.
- get_subdict(prefix_key: NonEmptySafeStrTuple | Sequence[str] | str) WriteOnceDict[ValueType][source]#
Return a WriteOnceDict view over a sub-keyspace.
- Parameters:
prefix_key – Prefix identifying the sub-dictionary.
- Returns:
- A new WriteOnceDict wrapping the corresponding
sub-dictionary of the underlying store, sharing the same p_consistency_checks probability.
- property p_consistency_checks: float#
Probability of checking a new value against the first value stored.
- Returns:
Probability in [0, 1].
- set_item_if(key: NonEmptySafeStrTuple | Sequence[str] | str, *, value: ValueType | Joker, condition: ETagConditionFlag, expected_etag: ETagValue | ItemNotAvailableFlag, retrieve_value: RetrieveValueFlag = IfETagChangedRetrieveFlag({})) ConditionalOperationResult[ValueType][source]#
Not supported for write-once dictionaries.
Conditional overwrites (
set_item_if) contradict write-once semantics, which only permit insert-if-absent. Usesetdefault_ifon the wrapped dict for conditional inserts.- Raises:
MutationPolicyError – Always raised.
- timestamp(key: NonEmptySafeStrTuple | Sequence[str] | str) float[source]#
Delegate timestamp retrieval to the wrapped dict.
- persidict.get_safe_chars() set[str][source]#
Get the set of allowed characters.
- Returns:
A copy of the set of characters considered safe for building file names and URL components. Includes ASCII letters, digits, and the characters ()_-~.= .
- persidict.replace_unsafe_chars(a_str: str, replace_with: str) str[source]#
Replace unsafe characters in a string.
Replaces any character not present in the safe-character set with a replacement substring.
- Parameters:
a_str – Input string that may contain unsafe characters.
replace_with – The substring to use for every unsafe character encountered in a_str.
- Returns:
The transformed string where all unsafe characters are replaced by the provided replacement substring.