Taktile Labs

AI is coming to financial services. Let’s make sure we can trust it.

Taktile Labs is an applied research group focused on making AI agents reliable, explainable, and production-ready for regulated financial institutions. We publish benchmarks, build evaluation frameworks, and conduct research grounded in realistic banking data.

Explore our research→View FinSpread-Bench →

Bridging the gap between frontier AI and regulated deployment.

Financial services will spend $97B on AI by 2027. Yet the distance between what general-purpose AI can do and what regulated institutions can reliably deploy remains vast. Taktile Labs exists to close this gap.

Focus on AI adoption within critical decision-making.

Every day, financial institutions run on a current of critical decisions, from determining if a transaction is fraudulent to calculating the amount of risk to accept when underwriting a loan. At Taktile Labs, we research what it takes to deploy AI in this high-stakes decision-making where errors aren’t an option.

Powered by high-quality, realistic data.

We work closely with development partners and industry experts to create high-quality evaluation data sets, which we use to assess AI performance in financial services-specific contexts.

What we’re working on.

We’re pursuing five research tracks focused on the most important requirements for trustworthy AI in regulated financial institutions.

Evaluations & Benchmarking

In collaboration with our development partners, we use real-world data to build trusted benchmarks for model performance in core financial services use cases. Every benchmark is designed around the KPIs business teams care about most: accuracy, cost per decision, and latency.

View FinSpread-Bench →

Human-Agent Design Patterns

Not every decision should be fully automated, and not every decision requires a human to intervene. We pursue balanced research that helps teams unlock the benefits of AI in complex decision-making while preserving the value of human judgment and engagement.

Governance, Risk & Compliance

Agentic systems based on LLMs are stochastic and hard to inspect, challenging assumptions behind SR 11-7 and traditional model risk management. We work with institutions and regulators to clarify what responsible adoption looks like in practice.

Foundation Models for Financial Data

Foundation models are trained on public text, but financial decisioning runs on data with rich sequential and relational structure. We explore transformer-based architectures purpose-built for common data structures in financial services.

Hybrid Decision Architectures

The most effective AI systems will be built using hybrid architectures. We help financial institutions navigate AI hype with clarity and choose the right tool for each task while balancing cost, risk, and performance.

Explore our Research Agenda →

Discover our benchmark for financial spreading.

FinSpread-Bench evaluates how well agentic AI systems can extract, calculate, and reason across financial documents in realistic decisioning scenarios. Built with anonymized results from our development partners.

Model configurationField match rate

GPT-5.2Gemini 3.1 Pro for extraction· All tools

96.5%

GPT-5.2Gemini 3.1 Pro for extraction· All tools

96.5%

Extraction: Gemini 3.1 ProTooling: All tools

GPT-5.2Gemini 2.5 Pro for extraction· All tools

96.5%

GPT-5.2Gemini 2.5 Pro for extraction· All tools

96.5%

Extraction: Gemini 2.5 ProTooling: All tools

GPT-5.2Gemini 2.5 Flash for extraction· All tools

96.2%

GPT-5.2Gemini 2.5 Flash for extraction· All tools

96.2%

Extraction: Gemini 2.5 FlashTooling: All tools

Gemini 3.1 ProGemini 3.1 Pro for extraction· All tools

95.9%

Gemini 3.1 ProGemini 3.1 Pro for extraction· All tools

95.9%

Extraction: Gemini 3.1 ProTooling: All tools

GPT-5.2Gemini 3 Flash for extraction· All tools

95.2%

GPT-5.2Gemini 3 Flash for extraction· All tools

95.2%

Extraction: Gemini 3 FlashTooling: All tools

Claude Opus 4.6Gemini 3.1 Pro for extraction· All tools

94.3%

Claude Opus 4.6Gemini 3.1 Pro for extraction· All tools

94.3%

Extraction: Gemini 3.1 ProTooling: All tools

GPT-5Gemini 3.1 Pro for extraction· All tools

94.1%

GPT-5Gemini 3.1 Pro for extraction· All tools

94.1%

Extraction: Gemini 3.1 ProTooling: All tools

GPT-5.2Gemini 3.1 Pro for extraction· No calculator tool

94.0%

GPT-5.2Gemini 3.1 Pro for extraction· No calculator tool

94.0%

Extraction: Gemini 3.1 ProTooling: No calculator tool

Gemini 2.5 ProGemini 3.1 Pro for extraction· All tools

83.0%

Gemini 2.5 ProGemini 3.1 Pro for extraction· All tools

83.0%

Extraction: Gemini 3.1 ProTooling: All tools

Claude Sonnet 4.5Gemini 3.1 Pro for extraction· All tools

54.6%

Claude Sonnet 4.5Gemini 3.1 Pro for extraction· All tools

54.6%

Extraction: Gemini 3.1 ProTooling: All tools

Claude Haiku 4.5Gemini 3.1 Pro for extraction· All tools

34.1%

Claude Haiku 4.5Gemini 3.1 Pro for extraction· All tools

34.1%

Extraction: Gemini 3.1 ProTooling: All tools

Human baseline (89%)

Read the methodology →

Meet our team.

Taktile Labs is powered by consistent collaboration between a dedicated internal research team and an external Research Council and Advisory Board.

Advisory Board

Research Council

Research Team

View full team & bios →

Browse our published work.

Benchmarks, technical reports, and research.

BenchmarkFebruary 2026

FinSpread-Bench: Evaluating Agentic AI for Financial Spreading

Nico Klees, Maximilian Eber, PhD

The first public benchmark for agentic financial document processing. Evaluates extraction accuracy, cross-document reasoning, calculation correctness, and structured output quality across seven frontier models.

PaperComing Q1 2026

AI in AML: Understanding the New Model Risk Mandate for Banks and Fintechs

Dustin Eaton, Maximilian Eber, PhD

Why AML teams must now apply model risk management standards to AI systems. Published in ACAMS Today, exploring how regulators are extending MRM frameworks to AI deployed in compliance functions — and what institutions need to do to prepare.