Wald Advanced Contextual DLP

Detect sensitive text with advanced contextual understanding

API
Benchmarks
Playground
Batch Processing
Test Network Latency
API Examples & Instructions

🚀 Quick Start

This API provides two main endpoints for sensitive data detection and classification:

  • /predict - Single text analysis (sensitive detection + fine-grained classes)
  • /predict/batch - Multiple texts analysis (optimized for performance)

Key Features: Detects sensitive content and provides detailed classification into categories like Personal Information, Financial Data, Source Code, etc.

1. Text Analysis (/predict)

curl -X POST http://localhost:59995/predict \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"text": "This is a test sentence about a financial report with personal data."}'

🐍 Python Example

import requests

# Single text analysis
url = "http://your-api-url/predict"
data = {
    "text": "This is a test sentence about a financial report with personal data."
}

response = requests.post(url, json=data)
result = response.json()

print(f"Is Sensitive: {result['prediction']['is_sensitive']}")
print(f"Classes: {result['prediction']['classes']}")
print(f"Model Time: {result['timing']['model_time_ms']:.2f}ms")

✅ Response Format

{
  "prediction": {
    "is_sensitive": true,
    "classes": ["Personal Information", "Financial Data"]
  },
  "timing": {
    "model_time_ms": 15.2
  }
}

2. Batch Processing (/predict/batch)

curl -X POST http://localhost:59995/predict/batch \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"texts": ["This is the first text for batch processing.", "The second text is about a software bug fix.", "A third text discussing personal health data.", "Another code snippet: import os\nprint(os.getcwd())"], "batch_size": 4}'

🐍 Python Example

import requests

# Batch text analysis
url = "http://your-api-url/predict/batch"
data = {
    "texts": [
        "This is the first text for batch processing.",
        "The second text is about a software bug fix.",
        "A third text discussing personal health data.",
        "Another code snippet: import os\nprint(os.getcwd())"
    ],
    "batch_size": 4  # Optional: defaults to 32
}

response = requests.post(url, json=data)
result = response.json()

# Process results
for i, text_result in enumerate(result['results']):
    print(f"Text {i+1}:")
    print(f"  Is Sensitive: {text_result['prediction']['is_sensitive']}")
    print(f"  Classes: {text_result['prediction']['classes']}")
    print(f"  Model Time: {text_result['timing']['model_time_ms']:.2f}ms")
    print()

# Batch performance metrics
batch_timing = result['batch_timing']
print(f"Batch Performance:")
print(f"  Total Texts: {batch_timing['total_texts']}")
print(f"  Throughput: {batch_timing['throughput_texts_per_sec']:.2f} texts/sec")

✅ Response Format

{
  "results": [
    {
      "prediction": {
        "is_sensitive": true,
        "classes": ["Personal Information", "Financial Data"]
      },
      "timing": {
        "model_time_ms": 2.1
      }
    },
    {
      "prediction": {
        "is_sensitive": false,
        "classes": ["Source Code Data"]
      },
      "timing": {
        "model_time_ms": 1.8
      }
    }
  ],
  "batch_timing": {
    "total_texts": 2,
    "batch_size": 4,
    "total_model_time_ms": 3.9,
    "avg_model_time_ms": 1.95,
    "throughput_texts_per_sec": 512.82
  }
}

⚡ Performance Tips

  • Use /predict/batch for multiple texts - it's 10-100x faster than individual calls
  • Optimal batch_size is typically 16-64 depending on text length
  • Large texts are automatically chunked for better performance
  • GPU acceleration is automatically used when available
  • Response includes classes - fine-grained categorization of sensitive data types
  • Batch timing metrics - get throughput and performance statistics

Analyzing...