Skip to main content

Caching System

The WispHub API implements an advanced in-memory caching system using Least Recently Used (LRU) eviction with Time-To-Live (TTL) support. This is the primary performance optimization technique that enables sub-5ms response times for frequently accessed data.

Why Caching?

Conversational bots (like WhatsApp bots) generate high-frequency, repetitive queries to the API:
  • Multiple users asking about their service status simultaneously
  • Repeated lookups of the same client data within short time windows
  • Frequent access to internet plan information during verification flows
Without caching, each request would require:
  1. Network roundtrip to WispHub Net (200-500ms latency)
  2. Database query on WispHub’s infrastructure
  3. Response serialization and network return
Total time: 500-1000ms per request With caching, subsequent requests return from RAM in less than 5ms.

async_lru Implementation

The API uses the async_lru library, which provides asynchronous LRU caching compatible with FastAPI’s async architecture.

Installation

requirements.txt
async-lru==2.2.0

Basic Usage Pattern

from async_lru import alru_cache

@alru_cache(maxsize=1, ttl=300)
async def get_clients() -> List[ClientResponse]:
    # Expensive operation (network call to WispHub Net)
    # Only executed when cache is empty or expired
    ...

Decorator Parameters

Type: intPurpose: Maximum number of cached results to storeExamples:
  • maxsize=1: Single cached value (e.g., entire client list)
  • maxsize=32: Multiple cached values (e.g., individual plan details)
LRU Behavior: When the cache is full, the least recently used item is evicted to make room for new entries.
Type: int (seconds)Purpose: Time-To-Live - how long cached data remains validExamples:
  • ttl=300: Cache expires after 5 minutes
  • ttl=900: Cache expires after 15 minutes
Behavior: After TTL expires, the next request triggers a fresh fetch from WispHub Net.

Client List Caching

The most critical cache is the global client list:
app/services/clients_service.py
from async_lru import alru_cache
from typing import List, Optional
import httpx

@alru_cache(maxsize=1, ttl=300)
async def get_clients() -> List[ClientResponse]:
    """
    Loads ALL clients from WispHub following pagination.
    Results are cached for 5 minutes to avoid repeated load.
    """
    all_results: List[ClientResponse] = []
    next_url: Optional[str] = settings.CLIENTS_URL

    async with httpx.AsyncClient(timeout=30, follow_redirects=True) as client:
        while next_url:
            response = await client.get(next_url, headers=HEADERS)
            if response.status_code != 200:
                break
            try:
                data = response.json()
            except Exception:
                break
            results = data.get("results")
            if not isinstance(results, list):
                break
            all_results.extend(parse_client(c) for c in results)
            next_url = data.get("next")  # None when no more pages

    return all_results

Why maxsize=1?

The entire client list is treated as a single cached entity because:
  1. Search operations need access to all clients for filtering
  2. Memory efficiency: Storing once vs. storing individual clients
  3. Consistency: All clients are refreshed together, preventing stale partial data

Why ttl=300 (5 minutes)?

Balancing freshness vs. performance:
  • Too short (under 1 min): Excessive load on WispHub Net
  • Too long (over 10 min): Risk of showing stale client data
  • 5 minutes: Sweet spot for conversational bot patterns
During peak hours with 100+ concurrent users, the 5-minute TTL reduces WispHub Net requests from ~6,000/hour to ~12/hour (99.8% reduction).

Internet Plans Caching

Internet plans change infrequently and are heavily referenced:
app/services/internet_plans_service.py
from async_lru import alru_cache

@alru_cache(maxsize=32, ttl=900)
async def list_internet_plans() -> Optional[List[InternetPlanListItem]]:
    async with httpx.AsyncClient(timeout=10) as client:
        response = await client.get(settings.PLANS_URL, headers=HEADERS)
        
    if response.status_code != 200:
        return None

    data = response.json()
    results = data.get("results")
       
    if not isinstance(results, list):
        return None

    return [
        InternetPlanListItem(
            plan_id=plan.get("id"),
            name=plan.get("nombre"),
            type=plan.get("tipo"),
        )
        for plan in results
    ]

Why maxsize=32?

Most WispHub deployments have 5-20 active plans. A cache size of 32 provides:
  • Room for all current plans
  • Headroom for seasonal/promotional plans
  • Historical plan lookups without eviction

Why ttl=900 (15 minutes)?

Plan pricing and specifications change rarely:
  • New plans: Added monthly or quarterly
  • Price changes: Typically announced in advance
  • 15-minute staleness is acceptable for this data type

Cache Flow Visualization

┌─────────────────────────────────────────────┐
│  Request: GET /api/v1/clients/search?q=John │
└─────────────────┬───────────────────────────┘


        ┌─────────────────┐
        │  fetch_clients  │
        │   _by_query()   │
        └────────┬────────┘


        ┌─────────────────┐
        │  get_clients()  │
        │   @alru_cache   │
        └────────┬────────┘

        ┌────────▼─────────┐
        │  Cache exists?   │
        │  TTL valid?      │
        └────────┬─────────┘
          YES ┌──┴──┐ NO
              │     │
      ┌───────▼─┐ ┌─▼────────────┐
      │ Return  │ │ Fetch from   │
      │ from    │ │ WispHub Net  │
      │ cache   │ │ (paginated)  │
      │ ~5ms    │ │ ~800ms       │
      └───┬─────┘ └─┬────────────┘
          │         │
          │         ▼
          │    ┌─────────────┐
          │    │ Store in    │
          │    │ cache       │
          │    └──────┬──────┘
          │           │
          └───────┬───┘

          ┌───────────────┐
          │ Filter by     │
          │ query string  │
          └───────┬───────┘

          ┌───────────────┐
          │ Return to     │
          │ client        │
          └───────────────┘

Cache Invalidation

The current implementation uses TTL-based invalidation only. There is no manual cache invalidation mechanism.

When Cache is Invalidated

  1. TTL Expiration: Automatic after configured seconds
  2. Server Restart: Cache is in-memory and is lost on restart
  3. LRU Eviction: When maxsize is exceeded (least recently used items)

Implications

If a client’s data is updated in WispHub Net:
  • Worst case delay: Up to 5 minutes (client list TTL)
  • Average case delay: ~2.5 minutes
  • Best case: Immediate (if cache already expired)
For critical operations requiring guaranteed fresh data, consider implementing a cache bypass flag or manual invalidation endpoint.

Memory Considerations

Estimated Memory Usage

1

Client List Cache

  • Typical deployment: 500-2000 clients
  • Per client: ~500 bytes (serialized)
  • Total: ~1MB for 2000 clients
2

Internet Plans Cache

  • Typical deployment: 10-20 plans
  • Per plan: ~200 bytes
  • Total: ~4KB for 20 plans
3

Overall Memory

  • Cache data: ~1-2MB
  • Python runtime: ~50MB
  • FastAPI framework: ~30MB
  • Total process: ~100-150MB
This is minimal compared to the performance gains achieved.

Performance Metrics

Response Time Comparison

OperationWithout CacheWith Cache (Hit)Improvement
List all clients800ms4ms200x faster
Search clients850ms6ms141x faster
List plans300ms3ms100x faster
Get client by ID250ms5ms50x faster

Cache Hit Ratio

During typical bot operations:
  • Client list queries: 98% hit ratio
  • Plan lookups: 95% hit ratio
  • Overall: 96% of requests served from cache

Monitoring Cache Performance

The async_lru library provides cache_info() method for monitoring, but it’s not exposed in the current API implementation.
To add cache monitoring, you could implement:
@app.get("/cache/stats")
async def cache_stats():
    return {
        "clients": get_clients.cache_info(),
        "plans": list_internet_plans.cache_info()
    }
This returns:
  • hits: Number of cache hits
  • misses: Number of cache misses
  • maxsize: Configured maximum size
  • currsize: Current cache size

Best Practices

Choose TTL Wisely

Balance data freshness requirements against load reduction. Monitor your data change frequency.

Size Appropriately

Set maxsize based on actual data volume plus headroom. Monitor memory usage.

Handle Cache Misses

Always have fallback logic for when WispHub Net is unavailable during cache refresh.

Document TTL

Make cache TTL values configurable via environment variables for easy tuning.

Advanced: Future Enhancements

Potential improvements to the caching system:
  1. Redis-backed caching: Share cache across multiple API instances
  2. Webhook invalidation: WispHub Net pushes updates to invalidate specific cache entries
  3. Conditional requests: Use ETags to validate cache freshness with WispHub Net
  4. Tiered caching: Different TTLs for different client states (active vs. suspended)
  5. Cache warming: Proactively refresh cache before TTL expiration during off-peak hours