API Documentation

Complete reference for the Document Intelligence API

Overview

The Document Intelligence API provides powerful document processing capabilities with configurable extraction methods.

Base URL:

https://isin-ai.com

Authentication

All API requests require authentication using an API key sent in the request header.

API Key Header

X-API-Key: YOUR_API_KEY

Include this header in all requests to authenticate. Requests without a valid API key will receive a 401 Unauthorized response.

Security Note: Keep your API key secure and never share it publicly. If your key is compromised, contact support to regenerate it.

POST /api/process

Process a single document with configurable extraction settings

Request Parameters

Parameter Type Required Description
X-API-Key Header Yes API authentication key
file File Yes PDF, PNG, JPG, or JPEG file
text_extractor String Yes high_quality, form, or llm
field_extractor String Yes schema_free or schema_based
schema JSON String No JSON schema (required if field_extractor=schema_based)
use_challenger Boolean No Enable challenger verification (default: false)

Python Examples

Example 1: High Quality OCR + Schema-Free (Auto-Discovery)

import requests

url = "https://your-replit-app.repl.co/api/process"
api_key = "YOUR_API_KEY"  # Replace with your actual API key
headers = {"X-API-Key": api_key}

with open("document.pdf", "rb") as f:
    files = {"file": f}
    data = {
        "text_extractor": "high_quality",
        "field_extractor": "schema_free",
        "use_challenger": False
    }
    
    response = requests.post(url, files=files, data=data, headers=headers)
    result = response.json()
    
    print("Extracted Fields:", result["extracted_fields"])

Example 2: Schema-Based Extraction (Business Permit)

import requests
import json

url = "https://your-replit-app.repl.co/api/process"
api_key = "YOUR_API_KEY"  # Replace with your actual API key
headers = {"X-API-Key": api_key}

# Define schema with fields array
schema = {
    "fields": [
        {
            "name": "business_name",
            "description": "Legal name of the business"
        },
        {
            "name": "owner_name",
            "description": "Full name of the business owner or proprietor"
        },
        {
            "name": "registration_date",
            "description": "Date when the business was registered"
        },
        {
            "name": "expiration_date",
            "description": "Date when the business permit expires"
        },
        {
            "name": "permit_number",
            "description": "Unique permit identification number"
        },
        {
            "name": "business_address",
            "description": "Physical address of the business"
        }
    ]
}

with open("business_permit.pdf", "rb") as f:
    files = {"file": f}
    data = {
        "text_extractor": "high_quality",
        "field_extractor": "schema_based",
        "schema": json.dumps(schema),
        "use_challenger": True
    }
    
    response = requests.post(url, files=files, data=data, headers=headers)
    result = response.json()
    
    print("Extracted Fields:", result["extracted_fields"])
    print("Verification:", result["challenger_results"])

Example 3: Form OCR

import requests

url = "https://your-replit-app.repl.co/api/process"
api_key = "YOUR_API_KEY"  # Replace with your actual API key
headers = {"X-API-Key": api_key}

with open("form.pdf", "rb") as f:
    files = {"file": f}
    data = {
        "text_extractor": "form",
        "field_extractor": "schema_free",
        "use_challenger": False
    }
    
    response = requests.post(url, files=files, data=data, headers=headers)
    result = response.json()
    
    print("Status:", "Success" if result["success"] else "Failed")
    print("Fields Found:", result["extracted_fields"])

Response Format

{
  "success": true,
  "extracted_text": "Full extracted text content...",
  "extracted_fields": {
    "field_name": "value",
    "another_field": "value"
  },
  "challenger_results": {  // Only if use_challenger=true
    "overall_accuracy": 95.5,
    "summary": "High confidence extraction",
    "verification_results": {
      "field_name": {
        "value": "verified_value",
        "is_accurate": true,
        "confidence": 98.0
      }
    }
  },
  "config_used": {
    "text_extractor": "high_quality",
    "field_extractor": "schema_free",
    "use_challenger": false
  },
  "processing_time_seconds": 12.5
}

POST /api/process-multiple

Process multiple documents and check consistency

Request Parameters

Parameter Type Required Description
X-API-Key Header Yes API authentication key
files File[] Yes Multiple PDF, PNG, JPG, or JPEG files
text_extractor String Yes high_quality, form, or llm
field_extractor String Yes schema_free or schema_based
schema JSON String No JSON schema for field extraction
use_challenger Boolean No Enable challenger verification (default: false)

Python Example

import requests

url = "https://your-replit-app.repl.co/api/process-multiple"
api_key = "YOUR_API_KEY"  # Replace with your actual API key
headers = {"X-API-Key": api_key}

files = [
    ("files", open("document1.pdf", "rb")),
    ("files", open("document2.pdf", "rb")),
    ("files", open("document3.pdf", "rb"))
]

data = {
    "text_extractor": "high_quality",
    "field_extractor": "schema_free",
    "use_challenger": False
}

response = requests.post(url, files=files, data=data, headers=headers)
result = response.json()

print("Total Documents:", result["total_documents"])
print("Consistency Status:", result["consistency"]["status"])
print("Summary:", result["consistency"]["summary"])

# Check for inconsistencies
if result["consistency"]["inconsistencies"]:
    print("\nInconsistencies Found:")
    for issue in result["consistency"]["inconsistencies"]:
        print(f"  - {issue['severity'].upper()}: {issue['description']}")
else:
    print("\nNo inconsistencies found - all documents match!")

# Access individual document results
for doc in result["results"]:
    print(f"\nDocument: {doc['filename']}")
    print(f"Fields: {doc['extracted_fields']}")

Response Format

{
  "success": true,
  "results": [
    {
      "filename": "document1.pdf",
      "extracted_fields": {...},
      "challenger_results": {...}  // If use_challenger=true
    },
    {
      "filename": "document2.pdf",
      "extracted_fields": {...},
      "challenger_results": {...}
    }
  ],
  "consistency": {
    "status": "pass" | "fail" | "warning",
    "summary": "Overall consistency assessment",
    "inconsistencies": [
      {
        "type": "name" | "date" | "amount" | "business_name" | "other",
        "severity": "critical" | "high" | "medium" | "low",
        "description": "Detailed description",
        "affected_documents": ["document1.pdf", "document2.pdf"],
        "details": "Specific values: Document 1: X, Document 2: Y"
      }
    ]
  },
  "total_documents": 2,
  "config_used": {
    "text_extractor": "high_quality",
    "field_extractor": "schema_free",
    "use_challenger": false
  }
}

Configuration Guide

Text Extraction Methods

  • High Quality OCR (high_quality): Best quality with layout preservation, optimal for complex documents
  • Form OCR (form): Optimized for structured form documents with fields and tables
  • LLM Vision (llm): AI-powered vision model for complex or handwritten documents

Field Extraction Modes

  • Schema-Free (schema_free): Automatically discovers all key-value pairs in the document
  • Schema-Based (schema_based): Extracts only specific fields defined in the schema

Challenger Verification

Enable use_challenger=true to get:

  • Secondary LLM validation of extracted fields
  • Confidence scores for each field
  • Accuracy flags and verification summary
  • Recommended for mission-critical applications

Error Handling

All endpoints return standard HTTP status codes:

  • 200 OK: Request successful
  • 400 Bad Request: Invalid parameters or file format
  • 500 Internal Server Error: Processing error

Error responses include a descriptive message:

{
  "detail": "Error description"
}