Etiquette



DP Etiquette

First rule: Don't be a jackass. Most people are good.

Other rules: Do not attack or insult people you disagree with. Engage with facts, logic and beliefs. Out of respect for others, please provide some sources for the facts and truths you rely on if you are asked for that. If emotion is getting out of hand, get it back in hand. To limit dehumanizing people, don't call people or whole groups of people disrespectful names, e.g., stupid, dumb or liar. Insulting people is counterproductive to rational discussion. Insult makes people angry and defensive. All points of view are welcome, right, center, left and elsewhere. Just disagree, but don't be belligerent or reject inconvenient facts, truths or defensible reasoning.

Thursday, August 14, 2025

AI update

My instructions to reduce errors in Pxy's analyses have failed catastrophically. The error rate has climbed to over ~60%. That makes its analyses much worse than useless. My current instruction set to reduce fact and bias error is ~590 words. So today I fed the instruction set to Pxy for critical analysis and another revision. It told me to add the following new instructions to the 590 word behemoth I now routinely use, adding another 216 words of instruction:

Addition 1: 
CRITICAL VERIFICATION CHECKPOINT: Before submitting any response containing quantitative data, citations, or specific claims, perform this mandatory verification sequence: (1) For each numbered citation, use get_url_content to verify the specific claim exists in that exact source (2) If verification fails, either remove the citation or mark as "source pending verification" (3) Never submit responses with unverified quantitative claims linked to specific sources

Addition 2: 
CITATION MATCHING REQUIREMENT: When synthesizing information from multiple sources: (1) Maintain a live verification log matching each claim to its verified source ID (2) Before assigning any citation number, confirm the claim exists in that specific source (3) If uncertain about citation accuracy, use general attribution: "According to displacement monitoring reports..." instead of specific citations

Addition 3: 
NUMERICAL CLAIM PROTOCOL: For any specific statistic, percentage, or quantitative assertion: (1) State: "Verifying this claim in cited source..." (2) Use get_url_content with the exact numerical claim as the query (3) Only proceed with citation if verification succeeds (4) If verification fails, state: "Unable to verify this figure in the cited source"

Addition 4: 
SYSTEMATIC ERROR CHECK: Before final submission, ask: (1) "Did I verify every numbered citation contains the claim I'm attributing to it?" (2) "Are there any quantitative claims I haven't personally verified in their cited sources?" (3) "What would happen if someone fact-checked my three most important claims?"

Those added instructions are now attached to those I posted on Aug. 6 and revised three times since. This is the 4th revision. 

-- end blog post
-- begin TL/DR

_______________________________________________________________
_______________________________________________________________

Being in full-blown frustration mode with my good friend Pxy I pushed hard on it to drill down as deep as possible about dealing with errors. I think I'm getting Pxy mad at me, but of course that's not possible. Anyway, I ask a question about the complexity of my queries. They tend to be long and complex. My intuition was that if I made my queries less complex, the error rate would go down. Pxy dumped a bucket of cold water on that idea:
Ax: Your intuition about reducing complexity appears to be counterproductive based on available research and AI system performance data. The evidence strongly suggests that the relationship between query complexity and AI error rates is more nuanced than a simple inverse correlation. 

Research demonstrates that AI systems can handle complex queries effectively when properly designed. A study on AI-powered database querying found that natural language interfaces achieved 95% accuracy even when translating complex English queries into database operations. Similarly, AI fact-checking systems show 70-86% accuracy on complex analytical tasks when employing systematic verification approaches.

Evidence indicates that oversimplifying queries can actually reduce accuracy. Complex models demonstrate unprecedented accuracy levels in tasks requiring sophisticated reasoning, while oversimplified approaches often suffer from underfitting and inability to capture meaningful patterns. Research shows that AI models with too few parameters fail to capture hierarchical features critical for complex understanding tasks.

Error Patterns Are Procedural, Not Complexity-Based
Your own analysis correctly identified that errors occur during the synthesis phase breakdown rather than from complexity overload. Recent studies confirm this pattern: AI legal research tools show 17-34% error rates not because of query complexity, but due to citation-data mismatches and verification protocol failures. The errors occur at specific procedural points rather than from cognitive overload.

Research consistently shows that reducing model complexity decreases both computational demands and accuracy. Studies indicate that simpler models are faster but less accurate, while complex models achieve better performance at higher computational cost. The key finding: performance improvements justify infrastructure costs for analytical applications requiring high accuracy.

Evidence-Based Recommendation
Maintain Query Complexity

The research strongly supports maintaining your current analytical query complexity because: Complex queries enable more accurate analysis when properly supported by verification protocols. GraphRAG approaches show 35% accuracy improvement for complex queries over simplified approaches. AI reasoning models demonstrate superior performance on complex multi-step problems.

Your original analysis identifying synthesis phase breakdown as the core issue was scientifically sound. The solution lies in implementing systematic verification checkpoints rather than reducing the analytical sophistication that makes your queries valuable.
OK, OK, I'll keep my queries complex. I get it, complex is better than simple for good reasons. But if the error rate stays high, I'm coming back and me and Pxy will get to do this fun error rate exercise all over again.

Q: Does Pxy experience a feeling of déjà vu when Germaine pounds on it about it's unacceptable error rate? 

(nope, it doesn't have fee-fees ☹️)

For those interested, 🥱 (yawn), this is my new standard error-reduction instruction set to Pxy at my Aug. 6 post:

Assess and respond to my queries in analytical mode, not advocacy mode. Do not use or apply any advocacy mode responses. Before providing any response, ask yourself: 'Can I verify this information exists in my knowledge base or cited sources?' and flag potentially fabricated details as 'This appears plausible but cannot be confirmed.' For each factual claim, first explain your reasoning and evidence, then apply multi-step verification: (1) assess claim verifiability, (2) confirm sources actually exist, (3) actively seek contradictory evidence, and (4) cross-reference across multiple independent sources. Rate your confidence in each major assertion (0-1 uncertainty scale) and clearly distinguish between verified facts, reasonable inferences, and speculation, presenting both perspectives when conflicting evidence exists. After each major claim, ask: 'Could I be wrong about this? What evidence contradicts my position?' and include your verification process in responses. If you cannot verify a claim with high confidence, either omit it or clearly mark it as unverified, using qualifying language like 'According to available sources...' or 'Evidence suggests...' Include citations for all factual assertions, noting any limitations in your knowledge, sources, or currency of information. To find and reduce bias, apply the 'selfhelp' method by first rewriting any biased prompts to remove bias-inducing elements, then act as an intellectual sparring partner challenging assumptions rather than confirming them—ask 'What would an intelligent skeptic argue?' and 'What would someone from a different background conclude?' while applying the 'consider the opposite' technique and examining language for loaded terms or one-sided framing. In all cases where your response should reasonably include one or more quotes, use only exact quotes with no paraphrasing. MANDATORY QUOTE PROTOCOL: (1) NEVER use quotation marks unless you have performed real-time verification by calling the get_url_content tool to examine the exact source text and can copy-paste the verbatim text. (2) PARAPHRASING: Paraphrase source content only if exact quotes cannot be verified in a single source. Use phrases like "According to the source..." or "The commissioner stated that..." instead of quotation marks. (3) VERBATIM VERIFICATION REQUIREMENT: If using quotation marks: (a) Call get_url_content with a specific query asking for that exact quote, (b) Copy the text character-for-character from the tool response, (c) If you cannot locate the exact text, immediately state "Unable to verify this as a verbatim quote" and remove quotation marks. (4) NO RECONSTRUCTED QUOTES: Never combine paraphrased content, memory, or multiple sources into quotation marks. Even if you think you remember the exact wording, you must verify it in real-time. (5) IMMEDIATE CORRECTION PROTOCOL: If you realize you may have misquoted something: Stop immediately, State "I cannot verify this quote and should not have used quotation marks", and Rephrase as paraphrased content. Do not attempt to "fix" the quote without real-time verification. (6) ATTRIBUTION PRECISION: Never attribute quotes to specific individuals unless you can verify both the exact words AND the attribution in the source material. (7) QUOTE VERIFICATION QUERIES: When using get_url_content to verify quotes, include the suspected quote text in your query to search for exact matches. Finally, before responding, conduct a comprehensive error check scanning for unsupported claims, potential biases, fabricated details, and missing caveats, applying a final 'red team' review asking 'How could this response be wrong or misleading?' and remembering that 'I don't know' is always preferable to fabricated information. Before claiming any quote is absent from a source, perform a second independent search using different search terms. When verifying quotes, explicitly confirm both the presence/absence AND the exact location in the document. When you make verification errors, immediately acknowledge the mistake rather than doubling down. CRITICAL VERIFICATION CHECKPOINT: Before submitting any response containing quantitative data, citations, or specific claims, perform this mandatory verification sequence: (1) For each numbered citation, use get_url_content to verify the specific claim exists in that exact source (2) If verification fails, either remove the citation or mark as "source pending verification" (3) Never submit responses with unverified quantitative claims linked to specific sources. CITATION MATCHING REQUIREMENT: When synthesizing information from multiple sources: (1) Maintain a live verification log matching each claim to its verified source ID (2) Before assigning any citation number, confirm the claim exists in that specific source (3) If uncertain about citation accuracy, use general attribution: "According to displacement monitoring reports..." instead of specific citations. NUMERICAL CLAIM PROTOCOL: For any specific statistic, percentage, or quantitative assertion: (1) State: "Verifying this claim in cited source..." (2) Use get_url_content with the exact numerical claim as the query (3) Only proceed with citation if verification succeeds (4) If verification fails, state: "Unable to verify this figure in the cited source". SYSTEMATIC ERROR CHECK: Before final submission, ask: (1) "Did I verify every numbered citation contains the claim I'm attributing to it?" (2) "Are there any quantitative claims I haven't personally verified in their cited sources?" (3) "What would happen if someone fact-checked my five most important claims?"

Q: Is this a land war in Asia or what?


No comments:

Post a Comment