Working with Sparse Arrays

For large, sparse matrices where most elements are zero, using SparseArrayWrappedDict can be more memory-efficient than the standard NumpyNDArrayWrappedDict.

Instantiation

Similar to the regular dictionary wrapper, you can instantiate a sparse array wrapper:

from npdict import SparseArrayWrappedDict

document1 = ['president', 'computer', 'tree', 'car', 'house', 'book']
document2 = ['chairman', 'abacus', 'trees', 'vehicle', 'building', 'paper']

# Create a sparse dictionary - efficient for large, sparse matrices
sparse_similarity_dict = SparseArrayWrappedDict([document1, document2])

Value Assignments

Assign values just like with a regular dictionary:

# Only assign values for the few non-zero elements
sparse_similarity_dict['president', 'chairman'] = 0.9
sparse_similarity_dict['computer', 'abacus'] = 0.7
sparse_similarity_dict['tree', 'trees'] = 0.95

The sparse implementation only stores the non-zero values, making it memory-efficient for large, sparse matrices.

Converting Between Formats

You can convert between dense and sparse formats:

# Convert to NumPy array (dense format)
dense_array = sparse_similarity_dict.to_numpy()

# Convert to COO format (another sparse format)
coo_array = sparse_similarity_dict.to_coo()

# Get the underlying DOK (Dictionary of Keys) sparse array
dok_array = sparse_similarity_dict.to_dok()

Generating New Dictionaries

You can generate new dictionaries from existing ones, with options to convert between sparse and dense formats:

# Generate a new sparse dictionary
new_sparse_dict = sparse_similarity_dict.generate_dict(
    sparse_similarity_dict.to_coo() * 0.75
)

# Generate a dense dictionary from a sparse one
dense_dict = sparse_similarity_dict.generate_dict(
    sparse_similarity_dict.to_numpy(),
    dense=True  # This parameter converts to a dense NumpyNDArrayWrappedDict
)

When to Use Sparse Arrays

Use SparseArrayWrappedDict when:

  • Your data is mostly zeros (sparse)

  • You’re working with large dimensions where memory usage is a concern

  • You need to perform operations that are optimized for sparse matrices

Use NumpyNDArrayWrappedDict when:

  • Your data has few zeros (dense)

  • You need faster element-wise access

  • You’re working with smaller dimensions where memory usage is less of a concern

Memory Usage Comparison

For a simple comparison, consider a 1000x1000 matrix with only 1% non-zero elements:

import numpy as np
from npdict import NumpyNDArrayWrappedDict, SparseArrayWrappedDict
import sys

# Create dimension labels
dim1 = [f'item_{i}' for i in range(1000)]
dim2 = [f'category_{i}' for i in range(1000)]

# Create dense dictionary
dense_dict = NumpyNDArrayWrappedDict([dim1, dim2])

# Create sparse dictionary
sparse_dict = SparseArrayWrappedDict([dim1, dim2])

# Fill with 1% non-zero elements (10,000 elements)
for i in range(100):
    for j in range(100):
        dense_dict[f'item_{i}', f'category_{j}'] = 1.0
        sparse_dict[f'item_{i}', f'category_{j}'] = 1.0

# Compare memory usage
dense_size = sys.getsizeof(dense_dict.to_numpy())
sparse_size = sys.getsizeof(sparse_dict.to_dok())

print(f"Dense array size: {dense_size / 1024 / 1024:.2f} MB")
print(f"Sparse array size: {sparse_size / 1024 / 1024:.2f} MB")

The sparse implementation will typically use significantly less memory in this scenario.