From PDF to typed fields. With confidence you can audit.
Vendor invoices, marketplace settlement files, packing slips, and manifests get parsed into typed columns with per-field confidence scores. Every extraction lands in ai_document_extractions with the source StorageObject attached.
- Vendor invoices, settlement PDFs, packing slips, and manifests parse into typed JSON with per-field confidence.
- Source PDF is preserved as a StorageObject in S3 with SSE-KMS; the extraction row links back to it forever.
- Below-threshold fields go to a review queue instead of silently auto-applying.
What you get.
Structured, not stringly-typed
Each field comes out with the right type — Decimal(14,4) for money, ISO date for dates, GSTIN as a constrained string. No more 'amount: \u20b91,234.50 ' garbage to clean up downstream.
Confidence per field
The model returns a confidence score per field, not per document. Below your threshold, the field falls into a review queue. Above it, the field flows into reconciliation, payouts, or finance directly.
Audit trail by default
The original PDF stays in your S3 bucket under the seller prefix. The extraction row carries the storage_key, the byte_size, the sha256, and the model version. You can re-run with a newer model and diff outputs.
From inbox PDF to a row in your finance table.
Drop a vendor invoice or a settlement PDF into Robnu and the extractor reads it into the same shape your finance table expects. Vendor invoices feed accounts-payable reconciliation; settlement PDFs feed the payment-reconciliation engine; packing slips and manifests feed the document pipeline that's already part of Process.
The same surface handles GSTIN sanity checks, PO matching against your purchase orders, and total/subtotal/tax cross-validation. Anything that doesn't add up short-circuits to review instead of polluting your books.
Document types in the A1 surface.
- Vendor invoices — including line items, GSTIN, totals, payment terms.
- Marketplace settlement files (AJIO XLSX + PDF; Meesho, Amazon, Flipkart on the road).
- Packing slips — outbound, with SKU, quantity, AWB, address ref.
- Customer + vendor invoices (the same documents the Process pipeline already produces).
- Manifest PDFs — for cross-reference against the carrier handover sheet.
- Return AWB labels — auto-stamped into return_awbs on receive scan.
Practical answers.
It's stored as a StorageObject in your S3 bucket under documents/{sellerId}/.../, encrypted with SSE-KMS using your seller's KMS alias. The extraction row carries storage_key + sha256, so you can always re-fetch and re-extract.
Yes. Per-tenant configuration lets you set the threshold for total, tax, GSTIN, due_date, etc. independently. Below threshold, the field goes to a review queue. Above, it flows downstream. Defaults are tuned for invoice-style documents.
RobnuAI uses the same gateway as the copilot — provider-agnostic, swap-without-codechange. The default is a hosted frontier model. We're investigating a smaller, deterministic open-weight model for high-volume settlement files.
Vendor names and addresses in Devanagari pass through. Numeric fields (totals, taxes, GSTIN, dates) are normalized to canonical form. Multi-language is a known direction in the roadmap, not the A1 default.
The extractor handles scanned PDFs and image-based PDFs. Confidence per field reflects the OCR quality — low-quality scans land in review more often. We never silently substitute a guess.
Try it inside your own dashboard.
Free during early access. No card. Forever free under 25 orders/day.
