New

Tidalwave Ai Tops General Models In Mortgage Underwriting Accuracy Benchmark

A joint study from mortgage technology platform Tidalwave and researchers at Columbia University found that Tidalwave’s mortgage-trained AI agent produced more accurate answers to loan underwriting questions than a general-purpose large language model.

The benchmarking evaluated Tidalwave’s SOLO agent against Claude 4.5 from Anthropic on 90 questions commonly asked by loan officers during the mortgage origination process. These included whether payroll matches a stated employer, whether bank statements show buy-now-pay-later payments, and whether deposits may come from foreign sources, among other questions.

Overall, Tidalwave’s SOLO recorded 84% accuracy compared with 71% for Claude 4.5, according to the study.

The biggest performance gap was in yes-or-no compliance checks — the questions used to flag issues such as payroll mismatches, undisclosed debts and suspicious transactions. Tidalwave’s SOLO scored 95% accuracy in that category, compared with 42% for the baseline model.

Transaction identification results were closer, with SOLO scoring 83% compared with 80% for Claude 4.5.

On account verification questions, however, Claude 4.5 outperformed SOLO, scoring 86% compared with 67% for Tidalwave’s system.

Diane Yu, co-founder and CEO of Tidalwave, said in an interview with HousingWire that the outperformance is intentional because SOLO strips out personally identifiable information (PII) before processing requests. This confirms the belief that generic large language models (LLMs) use PII data sent to them, which is a violation of customer privacy in mortgages.

Tidalwave also attributed the performance gap to differences in how the systems interpret mortgage data. General-purpose models analyze loan files as text, while SOLO is integrated with underwriting systems used by Fannie Mae and Freddie Mac and trained on structured mortgage datasets, including Uniform Loan Application Dataset (URLA) files and bank transaction records.

Loan officers increasingly use AI tools to review lengthy loan files and manage origination timelines that average more than 40 days, the company said. Lenders often lose money on each loan originated, creating pressure to automate parts of the process.

“Forty-two percent on compliance questions should worry every lender relying on off-the-shelf AI right now,” Yu said. “When I was building technology at Better.com, I watched general-purpose tools fail on mortgage data over and over. They’d miss a payroll mismatch or hallucinate a deposit source, and a human had to catch it every time.

“That’s why we built Tidalwave’s SOLO differently, and that’s why we tested it with Columbia University, not internally. If you’re going to tell lenders your AI is accurate, you should be willing to prove it publicly.”

The benchmarking was conducted in the second half of 2025 through a collaboration between Tidalwave’s engineering team and researchers at Columbia. The study evaluated 90 questions across 10 synthetic borrower scenarios, each including a full loan application file and up to two months of bank-statement transaction data.

Yu said that the findings are part of the first iteration of the benchmark study, as the researchers will continue to test SOLO against the most updated public LLMs.

Yu said that the 90 questions used in the benchmark were developed internally by Tidalwave’s in-house mortgage experts. The team developed the questions based on common usage patterns for Tidalwave’s system, including edge cases such as foreign transactions, mismatches between bank statements and loan applications, and deposits from lesser-known vendors.

Results were measured using an F1 score, according to the technical report.

“We partnered with Tidalwave on this benchmark to reflect the actual decision points loan officers face during origination, not abstract NLP tasks,” said Zhou Yu, an associate professor at Columbia University.

“By using realistic borrower scenarios, synthetic but structured data, and F1 scoring on both retrieval and yes/no checks, we can see where systems truly help loan officers and where they quietly fail. We hope this becomes a template for evaluating AI in other high-stakes, regulated workflows as well.”

Tidalwave’s SOLO platform is used by LOs at NEXA Lending through Bevri.ai, an independent AI solutions provider. The company also integrates with Plaid, Argyle, Truv and ICE Mortgage Technology to automate income, employment and asset verification.

Back to Listing

credit:

Tidalwave Ai Tops General Models In Mortgage Underwriting Accuracy Benchmark

Popular Products