Wald AdvancedDLP API

API

Benchmarks

Playground

Batch Processing

Test Network Latency

Linux VM (Nvidia L4 GPU) Performance Benchmarks

Machine	Prompt Length	QPS	Throughput (chars/sec)	Latency Mean (ms)	Latency P95 (ms)	Cost ($/B tokens)	Cost ($/B chars)

Is_Sensitive Model Evaluation Metrics

Precision

98.12%

Recall

93.73%

F1 Score

95.87%

Performance Metrics on Different End User Machine Types

Machine	Prompt Length	Throughput (chars/sec)	Prompts/sec	Latency Mean (ms)	Latency P95 (ms)	Peak Memory (MB)

Analyzing text...

Result

Enter text and click 'Analyze' to view results.

Categorizing text...

Categorization Results

Enter text above to see categorization results

Upload JSON/TXT File

Batch Configuration

Batch Size

Processing batch...

Batch Results

Enter multiple texts and click 'Batch Analyze' to view results.

Measuring round trip latency...

Round Trip Latency without any model inference

Enter text and click 'Test Latency' to measure.

Upload JSON Dataset

Evaluating dataset...

Evaluation Results

Enter a dataset in JSON format and click 'Evaluate Dataset' to view evaluation metrics.

Upload New Dataset

Upload JSON Dataset

Available Datasets

Loading datasets...

Uploading dataset...

Dataset Preview

Select a dataset to view its contents.

API Examples & Instructions

🚀 Quick Start

This API provides a main endpoint for sensitive data detection and classification:

/predict - Text analysis (sensitive detection + fine-grained classes)
/predict/dummy - No inference; use to measure roundtrip network latency only

Key Features: Detects sensitive content and provides detailed classification into categories like Personal Information, Financial Data, Source Code, etc.

1. Text Analysis (/predict)

Text to analyze:

Generated curl command:

curl -X POST http://localhost:59995/predict \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"text": "This is a test sentence about a financial report with personal data."}'

🐍 Python Example

import requests

# Single text analysis
url = "http://your-api-url/predict"
data = {
    "text": "This is a test sentence about a financial report with personal data."
}

response = requests.post(url, json=data)
result = response.json()

print(f"Is Sensitive: {result['prediction']['is_sensitive']}")
print(f"Classes: {result['prediction']['classes']}")
print(f"Model Time: {result['timing']['model_time_ms']:.2f}ms")
print(f"Total Server Time: {result['timing']['total_server_time_ms']:.2f}ms")

✅ Response Format

{
  "prediction": {
    "is_sensitive": true,
    "classes": ["Personal Information", "Financial Data"]
  },
  "timing": {
    "model_time_ms": 15.2,
    "total_server_time_ms": 18.5
  }
}

model_time_ms – Time spent in model inference only (tokenization + forward pass), in milliseconds.
total_server_time_ms – Total time the server spent handling the request (from receipt to response), in milliseconds. Includes model_time_ms plus parsing, logging, and other overhead.

2. Roundtrip network latency (/predict/dummy)

/predict/dummy accepts the same request body as /predict but performs no model inference. The server returns a fixed dummy prediction and timing. Use it to measure roundtrip network latency without inference overhead: the client-side elapsed time is dominated by network RTT, and timing.total_server_time_ms reflects server handling time (minimal). Subtract server time from client elapsed time to approximate network latency, or compare client latency for /predict vs /predict/dummy to isolate inference cost.

Example curl:

curl -X POST http://localhost:59995/predict/dummy \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"text": "any text"}'

Measure time from request send to response receive on the client; that is your roundtrip latency. The response includes timing.model_time_ms (fixed dummy value) and timing.total_server_time_ms (actual server handling time).

✅ Response (same shape as /predict, dummy data)

{
  "text": "any text",
  "dummy_prediction": {
    "is_sensitive": true,
    "is_code": false,
    "classes": ["Address Data", "Personal Attribute Data"],
    "confidence": 0.985
  },
  "timing": {
    "model_time_ms": 15.5,
    "total_server_time_ms": 2.1
  }
}

2. Batch Processing (/predict/batch)

Texts to process (one per line):

Batch size:

Generated curl command:

curl -X POST http://localhost:59995/predict/batch \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"texts": ["This is the first text for batch processing.", "The second text is about a software bug fix.", "A third text discussing personal health data.", "Another code snippet: import os\nprint(os.getcwd())"], "batch_size": 4}'

🐍 Python Example

import requests

# Batch text analysis
url = "http://your-api-url/predict/batch"
data = {
    "texts": [
        "This is the first text for batch processing.",
        "The second text is about a software bug fix.",
        "A third text discussing personal health data.",
        "Another code snippet: import os\nprint(os.getcwd())"
    ],
    "batch_size": 4  # Optional: defaults to 32
}

response = requests.post(url, json=data)
result = response.json()

# Process results
for i, text_result in enumerate(result['results']):
    print(f"Text {i+1}:")
    print(f"  Is Sensitive: {text_result['prediction']['is_sensitive']}")
    print(f"  Classes: {text_result['prediction']['classes']}")
    print(f"  Model Time: {text_result['timing']['model_time_ms']:.2f}ms")
    print()

# Batch performance metrics
batch_timing = result['batch_timing']
print(f"Batch Performance:")
print(f"  Total Texts: {batch_timing['total_texts']}")
print(f"  Throughput: {batch_timing['throughput_texts_per_sec']:.2f} texts/sec")

✅ Response Format

{
  "results": [
    {
      "prediction": {
        "is_sensitive": true,
        "classes": ["Personal Information", "Financial Data"]
      },
      "timing": {
        "model_time_ms": 2.1
      }
    },
    {
      "prediction": {
        "is_sensitive": false,
        "classes": ["Source Code Data"]
      },
      "timing": {
        "model_time_ms": 1.8
      }
    }
  ],
  "batch_timing": {
    "total_texts": 2,
    "batch_size": 4,
    "total_model_time_ms": 3.9,
    "avg_model_time_ms": 1.95,
    "throughput_texts_per_sec": 512.82
  }
}

Wald Advanced Contextual DLP

Categorization Results

🚀 Quick Start

1. Text Analysis (/predict)

🐍 Python Example

✅ Response Format

2. Roundtrip network latency (/predict/dummy)

✅ Response (same shape as /predict, dummy data)

2. Batch Processing (/predict/batch)

🐍 Python Example

✅ Response Format