Sunday, November 30, 2025

Fact checking technology update

A paper submitted to Arxiv, FlashCheck: Exploration of Efficient Evidence Retrieval for Fast Fact-Checking, considers the obvious idea of using AI (artificial intelligence) to try to counter some of the power of demagoguery and its effectiveness in deceiving and manipulating people. 

The FlashCheck concept addresses a critical bottleneck in automated fact-checking: the high computational cost and time lag in retrieving evidence from massive knowledge bases like Wikipedia. Current systems rely on "dense retrieval" (vector search), which is accurate but resource-heavy, making real-time verification difficult. 

To reduce the problem, the researchers propose a two-pronged optimization strategy, Corpus Pruning, which is indexing only factual statements instead of full text, and Index Compression using joint product quantization (JPQ)[1], a data compression technique. To move closer to real time fact checking, the researchers reduced Wikipedia size by ~93% from 9.70 GB to ~673 MB using JPQ. That resulted in​ a shorter latency time with up to a 10-fold speedup on CPUs and 20-fold speedup on GPUs compared to reference standards. The researchers used their AI model to fact-check the 2024 US Presidential Debate in real-time. The result was a 3.4-fold fact checking speed increase over existing fact check methods. Despite aggressive data compression, negligible performance loss was observed.

As time passes, continued fast fact checking will very likely further improve. A big question is whether it will make much difference. At present, a significant minority of the American public has been conditioned to treat MAGA's blatant lies and flawed reasoning as acceptable and reasonable. As long as that remains the case, it is unclear what effects might flow from improved fact checking.

Footnote for wonks:
1. JPQ employs (1) Joint Optimization to align the query encoder and the compressed index codewords using a shared ranking-oriented loss function, ensuring they work perfectly together, and (2) Hard Negative Sampling during training, where the program retrieves "hard negatives" (incorrect but similar answers) directly from the quantized index, teaching the model to distinguish subtle differences even in compressed data. The result was a massive compression effect that reduced the data size by ~93%, but with faster retrieval speeds without the same level of accuracy drop usually associated with compressing data.

No comments:

Post a Comment