Caching System

The WispHub API implements an advanced in-memory caching system using Least Recently Used (LRU) eviction with Time-To-Live (TTL) support. This is the primary performance optimization technique that enables sub-5ms response times for frequently accessed data.

Why Caching?

Conversational bots (like WhatsApp bots) generate high-frequency, repetitive queries to the API:

Multiple users asking about their service status simultaneously
Repeated lookups of the same client data within short time windows
Frequent access to internet plan information during verification flows

Without caching, each request would require:

Network roundtrip to WispHub Net (200-500ms latency)
Database query on WispHub’s infrastructure
Response serialization and network return

Total time: 500-1000ms per request With caching, subsequent requests return from RAM in less than 5ms.

async_lru Implementation

The API uses the async_lru library, which provides asynchronous LRU caching compatible with FastAPI’s async architecture.

Installation

requirements.txt

async-lru==2.2.0

Basic Usage Pattern

from async_lru import alru_cache

@alru_cache(maxsize=1, ttl=300)
async def get_clients() -> List[ClientResponse]:
    # Expensive operation (network call to WispHub Net)
    # Only executed when cache is empty or expired
    ...

Decorator Parameters

maxsize

Type: intPurpose: Maximum number of cached results to storeExamples:

maxsize=1: Single cached value (e.g., entire client list)
maxsize=32: Multiple cached values (e.g., individual plan details)

LRU Behavior: When the cache is full, the least recently used item is evicted to make room for new entries.

ttl

Type: int (seconds)Purpose: Time-To-Live - how long cached data remains validExamples:

ttl=300: Cache expires after 5 minutes
ttl=900: Cache expires after 15 minutes

Behavior: After TTL expires, the next request triggers a fresh fetch from WispHub Net.

Client List Caching

The most critical cache is the global client list:

app/services/clients_service.py

from async_lru import alru_cache
from typing import List, Optional
import httpx

@alru_cache(maxsize=1, ttl=300)
async def get_clients() -> List[ClientResponse]:
    """
    Loads ALL clients from WispHub following pagination.
    Results are cached for 5 minutes to avoid repeated load.
    """
    all_results: List[ClientResponse] = []
    next_url: Optional[str] = settings.CLIENTS_URL

    async with httpx.AsyncClient(timeout=30, follow_redirects=True) as client:
        while next_url:
            response = await client.get(next_url, headers=HEADERS)
            if response.status_code != 200:
                break
            try:
                data = response.json()
            except Exception:
                break
            results = data.get("results")
            if not isinstance(results, list):
                break
            all_results.extend(parse_client(c) for c in results)
            next_url = data.get("next")  # None when no more pages

    return all_results

Why maxsize=1?

The entire client list is treated as a single cached entity because:

Search operations need access to all clients for filtering
Memory efficiency: Storing once vs. storing individual clients
Consistency: All clients are refreshed together, preventing stale partial data

Why ttl=300 (5 minutes)?

Balancing freshness vs. performance:

Too short (under 1 min): Excessive load on WispHub Net
Too long (over 10 min): Risk of showing stale client data
5 minutes: Sweet spot for conversational bot patterns

During peak hours with 100+ concurrent users, the 5-minute TTL reduces WispHub Net requests from ~6,000/hour to ~12/hour (99.8% reduction).

Internet Plans Caching

Internet plans change infrequently and are heavily referenced:

app/services/internet_plans_service.py

from async_lru import alru_cache

@alru_cache(maxsize=32, ttl=900)
async def list_internet_plans() -> Optional[List[InternetPlanListItem]]:
    async with httpx.AsyncClient(timeout=10) as client:
        response = await client.get(settings.PLANS_URL, headers=HEADERS)
        
    if response.status_code != 200:
        return None

    data = response.json()
    results = data.get("results")
       
    if not isinstance(results, list):
        return None

    return [
        InternetPlanListItem(
            plan_id=plan.get("id"),
            name=plan.get("nombre"),
            type=plan.get("tipo"),
        )
        for plan in results
    ]

Why maxsize=32?

Most WispHub deployments have 5-20 active plans. A cache size of 32 provides:

Room for all current plans
Headroom for seasonal/promotional plans
Historical plan lookups without eviction

Why ttl=900 (15 minutes)?

Plan pricing and specifications change rarely:

New plans: Added monthly or quarterly
Price changes: Typically announced in advance
15-minute staleness is acceptable for this data type

Cache Flow Visualization

┌─────────────────────────────────────────────┐
│  Request: GET /api/v1/clients/search?q=John │
└─────────────────┬───────────────────────────┘
                  │
                  ▼
        ┌─────────────────┐
        │  fetch_clients  │
        │   _by_query()   │
        └────────┬────────┘
                 │
                 ▼
        ┌─────────────────┐
        │  get_clients()  │
        │   @alru_cache   │
        └────────┬────────┘
                 │
        ┌────────▼─────────┐
        │  Cache exists?   │
        │  TTL valid?      │
        └────────┬─────────┘
          YES ┌──┴──┐ NO
              │     │
      ┌───────▼─┐ ┌─▼────────────┐
      │ Return  │ │ Fetch from   │
      │ from    │ │ WispHub Net  │
      │ cache   │ │ (paginated)  │
      │ ~5ms    │ │ ~800ms       │
      └───┬─────┘ └─┬────────────┘
          │         │
          │         ▼
          │    ┌─────────────┐
          │    │ Store in    │
          │    │ cache       │
          │    └──────┬──────┘
          │           │
          └───────┬───┘
                  ▼
          ┌───────────────┐
          │ Filter by     │
          │ query string  │
          └───────┬───────┘
                  ▼
          ┌───────────────┐
          │ Return to     │
          │ client        │
          └───────────────┘

Cache Invalidation

The current implementation uses TTL-based invalidation only. There is no manual cache invalidation mechanism.

When Cache is Invalidated

TTL Expiration: Automatic after configured seconds
Server Restart: Cache is in-memory and is lost on restart
LRU Eviction: When maxsize is exceeded (least recently used items)

Implications

If a client’s data is updated in WispHub Net:

Worst case delay: Up to 5 minutes (client list TTL)
Average case delay: ~2.5 minutes
Best case: Immediate (if cache already expired)

For critical operations requiring guaranteed fresh data, consider implementing a cache bypass flag or manual invalidation endpoint.

Memory Considerations

Estimated Memory Usage

Client List Cache

Typical deployment: 500-2000 clients
Per client: ~500 bytes (serialized)
Total: ~1MB for 2000 clients

Internet Plans Cache

Typical deployment: 10-20 plans
Per plan: ~200 bytes
Total: ~4KB for 20 plans

Overall Memory

Cache data: ~1-2MB
Python runtime: ~50MB
FastAPI framework: ~30MB
Total process: ~100-150MB

This is minimal compared to the performance gains achieved.

Performance Metrics

Response Time Comparison

Operation	Without Cache	With Cache (Hit)	Improvement
List all clients	800ms	4ms	200x faster
Search clients	850ms	6ms	141x faster
List plans	300ms	3ms	100x faster
Get client by ID	250ms	5ms	50x faster

Cache Hit Ratio

During typical bot operations:

Client list queries: 98% hit ratio
Plan lookups: 95% hit ratio
Overall: 96% of requests served from cache

Monitoring Cache Performance

The async_lru library provides cache_info() method for monitoring, but it’s not exposed in the current API implementation.

To add cache monitoring, you could implement:

@app.get("/cache/stats")
async def cache_stats():
    return {
        "clients": get_clients.cache_info(),
        "plans": list_internet_plans.cache_info()
    }

This returns:

hits: Number of cache hits
misses: Number of cache misses
maxsize: Configured maximum size
currsize: Current cache size

Best Practices

Choose TTL Wisely

Balance data freshness requirements against load reduction. Monitor your data change frequency.

Size Appropriately

Set maxsize based on actual data volume plus headroom. Monitor memory usage.

Handle Cache Misses

Always have fallback logic for when WispHub Net is unavailable during cache refresh.

Document TTL

Make cache TTL values configurable via environment variables for easy tuning.

Advanced: Future Enhancements

Potential improvements to the caching system:

Redis-backed caching: Share cache across multiple API instances
Webhook invalidation: WispHub Net pushes updates to invalidate specific cache entries
Conditional requests: Use ETags to validate cache freshness with WispHub Net
Tiered caching: Different TTLs for different client states (active vs. suspended)
Cache warming: Proactively refresh cache before TTL expiration during off-peak hours

Architecture

See how caching fits into overall architecture

Load Testing

Verify cache performance under load

​Caching System

​Why Caching?

​async_lru Implementation

​Installation

​Basic Usage Pattern

​Decorator Parameters

​Client List Caching

​Why maxsize=1?

​Why ttl=300 (5 minutes)?

​Internet Plans Caching

​Why maxsize=32?

​Why ttl=900 (15 minutes)?

​Cache Flow Visualization

​Cache Invalidation

​When Cache is Invalidated

​Implications

​Memory Considerations

​Estimated Memory Usage

​Performance Metrics

​Response Time Comparison

​Cache Hit Ratio

​Monitoring Cache Performance

​Best Practices

Choose TTL Wisely

Size Appropriately

Handle Cache Misses

Document TTL

​Advanced: Future Enhancements

​Related Topics

Architecture

Load Testing

Caching System

Why Caching?

async_lru Implementation

Installation

Basic Usage Pattern

Decorator Parameters

Client List Caching

Why maxsize=1?

Why ttl=300 (5 minutes)?

Internet Plans Caching

Why maxsize=32?

Why ttl=900 (15 minutes)?

Cache Flow Visualization

Cache Invalidation

When Cache is Invalidated

Implications

Memory Considerations

Estimated Memory Usage

Performance Metrics

Response Time Comparison

Cache Hit Ratio

Monitoring Cache Performance

Best Practices

Advanced: Future Enhancements

Related Topics