S
Sarah Guthals, PhD
Guest
Missing evidence is one of the biggest blockers in production AI workflows.
Itβs not enough to say what a document claims, you need to show where in the source that claim came from. Whether youβre auditing bank statements, verifying medical referral forms, or investigating fraud, traceability is a hard requirement.
Thatβs why weβve introduced a new parameter in Tensorlakeβs
When
This means structured outputs are no longer just machine-readable; theyβre auditable, verifiable, and traceable back to the source document.
In many workflows, βclose enoughβ isnβt good enough. Teams need confidence that extracted values align with the documentβs ground truth. Letβs look at where this matters most:
In short:
Letβs take a simple example: extracting transaction summaries from a bank statement.
The returned JSON now looks like this:
Each field is now annotated with a citation: the page number and bounding-box coordinates.
If you use our Tensorlake Cloud Playground, you can even get the visual bounding-boxes labeled for each extracted bit of information

Citations arenβt just nice-to-have, our customers across industries know that they unlock new workflows:
The benefit is twofold: engineers can build more reliable systems and stakeholders (auditors, compliance teams, regulators) get confidence and transparency.
You can try
If you have any questions or feedback, we'd love to hear from you! Join our Slack and let us know how you're using citations.
With the new
Every field can now point back to its exact source location in the document, making Tensorlake the foundation for audit-ready, compliance-grade, and fraud-resistant AI workflows.
Start using it today. In production AI, traceability isnβt optional.
Continue reading...
Itβs not enough to say what a document claims, you need to show where in the source that claim came from. Whether youβre auditing bank statements, verifying medical referral forms, or investigating fraud, traceability is a hard requirement.
Thatβs why weβve introduced a new parameter in Tensorlakeβs
StructuredExtractionOptions
:
Code:
StructuredExtractionOptions(
schema_name="ExampleSchema",
json_schema=ExampleSchema,
provide_citations=True
)
When
provide_citations=True
, every extracted field includes:- Page number
- Bounding box (bbox) coordinates
This means structured outputs are no longer just machine-readable; theyβre auditable, verifiable, and traceable back to the source document.
Traceable Context Means Trustworthy RAG
In many workflows, βclose enoughβ isnβt good enough. Teams need confidence that extracted values align with the documentβs ground truth. Letβs look at where this matters most:
- Banking & Finance: Auditors need to understand exactly which account, statement, or transaction produced a reported number. If an account balance doesnβt reconcile, citations let you trace back to the precise page and bounding box where the discrepancy originates. No more guesswork in backtracking totals.
- Fraud Detection: When anomalies appear in reported values, bounding-box citations provide the evidence trail. Investigators can quickly verify whether a suspicious number came from an altered document, a duplicated entry, or a genuine filing.
- Healthcare & Forms Processing: At UCLA, teams processing medical referral forms wanted faster verification of ground truth. With citations, a structured field (like βreferral dateβ or βdoctorβs signatureβ) can point directly to the page span and bounding box where it was found, cutting human review time dramatically.
In short:
Citations turn structured extraction into a compliance-grade tool.
Implement Citations with One Line of Code
Letβs take a simple example: extracting transaction summaries from a bank statement.
Code:
from tensorlake.documentai import DocumentAI, StructuredExtractionOptions
from pydantic import BaseModel, Field
from typing import List
class Transaction(BaseModel):
date: str = Field(description="Transaction date")
description: str = Field(description="Transaction description")
amount: float = Field(description="Transaction amount")
class BankStatement(BaseModel):
transactions: List[Transaction]
doc_ai = DocumentAI()
structured_extraction_options = [
StructuredExtractionOptions(
schema_name="BankStatement",
json_schema=BankStatement,
provide_citations=True # <-- new parameter
)
]
result = doc_ai.parse_and_wait(
file="https://tlake.link/documents/bank-statement",
structured_extraction_options=structured_extraction_options
)
print(result.structured_data[0].data)
The returned JSON now looks like this:
Code:
"transactions": [
{
"Date": "08/24",
"Date_citation": [
{
"page_number": 1,
"x1": 59,
"x2": 135,
"y1": 448,
"y2": 482
}
],
"amount": "50.00",
"amount_citation": [
{
"page_number": 1,
"x1": 515,
"x2": 585,
"y1": 447,
"y2": 482
}
],
"descriptions": "ATM CASH DEPOSIT, ***** 30073995581 AUT 082220 ATM CASH DEPOSIT 550 LONG BEACH BLVD LONG BEACH * NY",
"descriptions_citation": [
{
"page_number": 1,
"x1": 135,
"x2": 515,
"y1": 447,
"y2": 482
}
]
}
Each field is now annotated with a citation: the page number and bounding-box coordinates.
If you use our Tensorlake Cloud Playground, you can even get the visual bounding-boxes labeled for each extracted bit of information

From Data to Evidence
βIn insurance, structured outputs power our workflows, but people still verify. With field-level citations, reviewers can jump from a data row straight to the exact COI or endorsement language. Thatβs the difference between βparsedβ and provable.β
β Jesse McClure, CTO and Co-Founder, Sublynk
Citations arenβt just nice-to-have, our customers across industries know that they unlock new workflows:
- Audit-ready outputs: Every number is backed by ground-truth evidence.
- Automated review: Flag discrepancies automatically and point reviewers directly to the source.
- Explainability in RAG/Agents: Donβt just return answersβreturn the highlighted document snippets.
- UI Enhancements: Build document viewers that highlight the exact fields extracted.
The benefit is twofold: engineers can build more reliable systems and stakeholders (auditors, compliance teams, regulators) get confidence and transparency.
Try Structured Extraction Citations Now
You can try
provide_citations=True
today in both the Tensorlake Playground and the API/Python SDK.- Docs: Structured Extraction
- Example Notebook: Parse Bank Statements
If you have any questions or feedback, we'd love to hear from you! Join our Slack and let us know how you're using citations.
Traceability Built In
With the new
provide_citations
parameter, structured extraction becomes not only machine-readable but also evidence-backed.Every field can now point back to its exact source location in the document, making Tensorlake the foundation for audit-ready, compliance-grade, and fraud-resistant AI workflows.
Start using it today. In production AI, traceability isnβt optional.
Continue reading...