API Documentation
Complete reference for the Document Intelligence API
Overview
The Document Intelligence API provides powerful document processing capabilities with configurable extraction methods.
Base URL:
https://isin-ai.com
Authentication
All API requests require authentication using an API key sent in the request header.
API Key Header
X-API-Key: YOUR_API_KEY
Include this header in all requests to authenticate. Requests without a valid API key will receive a 401 Unauthorized response.
Security Note: Keep your API key secure and never share it publicly. If your key is compromised, contact support to regenerate it.
POST /api/process
Process a single document with configurable extraction settings
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| X-API-Key | Header | Yes | API authentication key |
| file | File | Yes | PDF, PNG, JPG, or JPEG file |
| text_extractor | String | Yes | high_quality, form, or llm |
| field_extractor | String | Yes | schema_free or schema_based |
| schema | JSON String | No | JSON schema (required if field_extractor=schema_based) |
| use_challenger | Boolean | No | Enable challenger verification (default: false) |
Python Examples
Example 1: High Quality OCR + Schema-Free (Auto-Discovery)
import requests
url = "https://your-replit-app.repl.co/api/process"
api_key = "YOUR_API_KEY" # Replace with your actual API key
headers = {"X-API-Key": api_key}
with open("document.pdf", "rb") as f:
files = {"file": f}
data = {
"text_extractor": "high_quality",
"field_extractor": "schema_free",
"use_challenger": False
}
response = requests.post(url, files=files, data=data, headers=headers)
result = response.json()
print("Extracted Fields:", result["extracted_fields"])
Example 2: Schema-Based Extraction (Business Permit)
import requests
import json
url = "https://your-replit-app.repl.co/api/process"
api_key = "YOUR_API_KEY" # Replace with your actual API key
headers = {"X-API-Key": api_key}
# Define schema with fields array
schema = {
"fields": [
{
"name": "business_name",
"description": "Legal name of the business"
},
{
"name": "owner_name",
"description": "Full name of the business owner or proprietor"
},
{
"name": "registration_date",
"description": "Date when the business was registered"
},
{
"name": "expiration_date",
"description": "Date when the business permit expires"
},
{
"name": "permit_number",
"description": "Unique permit identification number"
},
{
"name": "business_address",
"description": "Physical address of the business"
}
]
}
with open("business_permit.pdf", "rb") as f:
files = {"file": f}
data = {
"text_extractor": "high_quality",
"field_extractor": "schema_based",
"schema": json.dumps(schema),
"use_challenger": True
}
response = requests.post(url, files=files, data=data, headers=headers)
result = response.json()
print("Extracted Fields:", result["extracted_fields"])
print("Verification:", result["challenger_results"])
Example 3: Form OCR
import requests
url = "https://your-replit-app.repl.co/api/process"
api_key = "YOUR_API_KEY" # Replace with your actual API key
headers = {"X-API-Key": api_key}
with open("form.pdf", "rb") as f:
files = {"file": f}
data = {
"text_extractor": "form",
"field_extractor": "schema_free",
"use_challenger": False
}
response = requests.post(url, files=files, data=data, headers=headers)
result = response.json()
print("Status:", "Success" if result["success"] else "Failed")
print("Fields Found:", result["extracted_fields"])
Response Format
{
"success": true,
"extracted_text": "Full extracted text content...",
"extracted_fields": {
"field_name": "value",
"another_field": "value"
},
"challenger_results": { // Only if use_challenger=true
"overall_accuracy": 95.5,
"summary": "High confidence extraction",
"verification_results": {
"field_name": {
"value": "verified_value",
"is_accurate": true,
"confidence": 98.0
}
}
},
"config_used": {
"text_extractor": "high_quality",
"field_extractor": "schema_free",
"use_challenger": false
},
"processing_time_seconds": 12.5
}
POST /api/process-multiple
Process multiple documents and check consistency
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| X-API-Key | Header | Yes | API authentication key |
| files | File[] | Yes | Multiple PDF, PNG, JPG, or JPEG files |
| text_extractor | String | Yes | high_quality, form, or llm |
| field_extractor | String | Yes | schema_free or schema_based |
| schema | JSON String | No | JSON schema for field extraction |
| use_challenger | Boolean | No | Enable challenger verification (default: false) |
Python Example
import requests
url = "https://your-replit-app.repl.co/api/process-multiple"
api_key = "YOUR_API_KEY" # Replace with your actual API key
headers = {"X-API-Key": api_key}
files = [
("files", open("document1.pdf", "rb")),
("files", open("document2.pdf", "rb")),
("files", open("document3.pdf", "rb"))
]
data = {
"text_extractor": "high_quality",
"field_extractor": "schema_free",
"use_challenger": False
}
response = requests.post(url, files=files, data=data, headers=headers)
result = response.json()
print("Total Documents:", result["total_documents"])
print("Consistency Status:", result["consistency"]["status"])
print("Summary:", result["consistency"]["summary"])
# Check for inconsistencies
if result["consistency"]["inconsistencies"]:
print("\nInconsistencies Found:")
for issue in result["consistency"]["inconsistencies"]:
print(f" - {issue['severity'].upper()}: {issue['description']}")
else:
print("\nNo inconsistencies found - all documents match!")
# Access individual document results
for doc in result["results"]:
print(f"\nDocument: {doc['filename']}")
print(f"Fields: {doc['extracted_fields']}")
Response Format
{
"success": true,
"results": [
{
"filename": "document1.pdf",
"extracted_fields": {...},
"challenger_results": {...} // If use_challenger=true
},
{
"filename": "document2.pdf",
"extracted_fields": {...},
"challenger_results": {...}
}
],
"consistency": {
"status": "pass" | "fail" | "warning",
"summary": "Overall consistency assessment",
"inconsistencies": [
{
"type": "name" | "date" | "amount" | "business_name" | "other",
"severity": "critical" | "high" | "medium" | "low",
"description": "Detailed description",
"affected_documents": ["document1.pdf", "document2.pdf"],
"details": "Specific values: Document 1: X, Document 2: Y"
}
]
},
"total_documents": 2,
"config_used": {
"text_extractor": "high_quality",
"field_extractor": "schema_free",
"use_challenger": false
}
}
Configuration Guide
Text Extraction Methods
- High Quality OCR (
high_quality): Best quality with layout preservation, optimal for complex documents - Form OCR (
form): Optimized for structured form documents with fields and tables - LLM Vision (
llm): AI-powered vision model for complex or handwritten documents
Field Extraction Modes
- Schema-Free (
schema_free): Automatically discovers all key-value pairs in the document - Schema-Based (
schema_based): Extracts only specific fields defined in the schema
Challenger Verification
Enable use_challenger=true to get:
- Secondary LLM validation of extracted fields
- Confidence scores for each field
- Accuracy flags and verification summary
- Recommended for mission-critical applications
Error Handling
All endpoints return standard HTTP status codes:
- 200 OK: Request successful
- 400 Bad Request: Invalid parameters or file format
- 500 Internal Server Error: Processing error
Error responses include a descriptive message:
{
"detail": "Error description"
}