Etiquette



DP Etiquette

First rule: Don't be a jackass. Most people are good.

Other rules: Do not attack or insult people you disagree with. Engage with facts, logic and beliefs. Out of respect for others, please provide some sources for the facts and truths you rely on if you are asked for that. If emotion is getting out of hand, get it back in hand. To limit dehumanizing people, don't call people or whole groups of people disrespectful names, e.g., stupid, dumb or liar. Insulting people is counterproductive to rational discussion. Insult makes people angry and defensive. All points of view are welcome, right, center, left and elsewhere. Just disagree, but don't be belligerent or reject inconvenient facts, truths or defensible reasoning.

Sunday, July 27, 2025

About Comet, Pxy's agentic AI extension

AAI (agentic AI) like can "do things" you tell it to do. Unlike Pxy, which can just search and analyze, AAI has "agency" and can actively find things by more than just searching webpages, e.g., it can open and analyze webpages it finds links to on a webpage. An example explains it better than I can.

I asked Pxy to compare itself to Comet, which is the AAI addition to what Pxy corp. is selling: 
Q: In terms of searching and analysis, can Comet access and analyze more information, than Perplexity, e.g., by opening more links? For example, would Comet be better able to answer this query than Pxy?: "Scan this blog, https://dispol.blogspot.com/, for posts that link to Pxy searches and briefly describe the context for the link, and also state what % of blog posts there are by the author Germaine compared to other authors like PD and Susan.". I'm interested in a general capability assessment.

A: Comet, Perplexity's AI-powered browser, provides significantly more comprehensive access to information than traditional Perplexity searches. The key advantage lies in its Personal Search feature, which leverages browsing history and AI-driven web navigation to retrieve and analyze information on your behalf.

This includes:
  • Direct tab and page analysis: Comet can analyze content across multiple open tabs simultaneously and maintain context between them
  • Authorization-based access: Unlike regular web searches, Comet can access pages that require login credentials or personal authorization, such as social networks, email accounts, and private dashboards
  • Automated navigation: The browser can navigate through multi-page websites, follow internal links, and perform complex browsing workflows autonomously
Traditional Perplexity Limitations:

Regular Perplexity searches are limited to publicly accessible web content and cannot maintain persistent browsing sessions or access authenticated content. While Perplexity excels at synthesizing information from multiple sources, it cannot perform the kind of deep, multi-page website analysis that your example query requires.

Answering Your Specific Query Example

For your example query about analyzing the blog at https://dispol.blogspot.com for Perplexity links and author percentages, Comet would indeed be significantly more capable.

Comet's Advantages:
  • Can systematically browse through blog archives and individual posts
  • Maintain context about link patterns and author information across multiple pages
  • Access and analyze the full blog structure, including navigation between different time periods and categories
  • Perform comprehensive content analysis across the entire site
Current Perplexity Limitations:
Based on the search results, regular Perplexity can only analyze the immediate content available on a single page load. For the sample blog provided, this would mean limited analysis of only the most recent posts visible on the homepage, missing the comprehensive site-wide analysis your query requires.
Comet opens up a vast new way to do searching and analysis of far more data than Pxy, which can do far more than me.

I'm a bit scared of Comet because it is programmed to learn "me" and eventually be able to shift me into the intention economy from the current state of internet affairs called the attention economy. However, since I get Comet for free at least until my current Pxy pro subscription ends, it need to be tested out. Presently, Comet costs $200/month for new subscribers. I got on a wait list to get access to Comet for free at least for now.

My 1st Comet search, "Scan this blog, https://dispol.blogspot.com/, for posts that link to Perplexity searches and briefly describe the context for the link, and also state what % of blog posts there are by the author Germaine compared to other authors like PD and Susan.", gave an impressive response. It was able to count all the posts for July 15-26 (a time period Comet picked), find where in the post the author is named and list which posts I had linked to Pxy searches and state the context for why the link was there. Pxy simply cannot do that.

So I asked Comet a bigger question: Repeat the Author Distribution Analysis for the entire blog since the first post in 2015 until now. It did it, but wrong, finding no posts by Susan or PD. I posed multiple rounds of queries. Comet kept failing, so I refocused my queries to why it failed. It failed because my query passed Comet's data and bandwidth limits, e.g., "Comet does not “spider” every file on a site.– Each browse _web call is capped at ±30 seconds and ±300 kB of returned HTML."[1] 

This AAI business is kind of scary. But like it or not, it is now here. I'll need some time to figure out how to maximize the power of AI with agency. I presume that (1) Comet will make the same kind of mistakes that Pxy is prone to making and for the same reasons, and (2) the same error reduction instructions I feed to Pxy will work about the same with Comet. 

As far as Comet learning about me, I started to look for what and how much of me it will pick up and use, starting with this query: "Why do you use British spellings, e.g., favour and optimisation, instead of favor and optimization? It is annoying."

It responded: "You're absolutely right, and I apologize for the inconsistency. I should be using American spellings consistently, especially when discussing an American political blog and communicating with what appears to be an American audience. .... I'll make sure to stick to American spellings going forward in our conversation. Thanks for pointing this out - consistency in language conventions does matter for clear communication."

Time will tell how AAI plays out.


Footnote:
1. There is a learning curve here. This further describes Comet's limits:
    • A single search_web call can issue at most 3 keyword queries and returns up to 10 results per query.

    • The conversation as a whole is throttled to roughly 100 external requests. For large blogs (5 000+ posts) that is far short of an exhaustive crawl.

    • If the initial batches already hit the time/volume guardrails, the agent will normally stop rather than loop through every month of every year.

  1. Heuristics that favour frequent authors

    • Comet looks first for the author label that Blogger places under the title (“Posted by …”) on archive pages. If an author appears once, but only on the individual-post page (and that page was never opened), the name is missed.

    • When duplicate author strings are seen, the agent stops adding new pages once no new names appear—an optimisation that again hurts single-appearance writers.

  2. Practical constraints

    • Large HTML pages are clipped at 300 kB, so very long pages may be truncated before the “Posted by” line.

    • JavaScript-rendered content is not executed; if the author attribution is injected client-side it will be invisible to the scraper.

    • Rate-limit etiquette: the tool inserts a polite delay and will not open hundreds of pages from the same host in one burst, both to avoid being blocked and to stay within the 120-second per-turn budget.

How to catch the single-post author next time

  • Manually enumerate monthly archive URLs (/2019/08//2019/07/, …) and let the tool open each page.

  • Scrape the blog’s Atom/RSS feed – it lists every post in chronological order without pagination limits.

  • If the platform supports it, append ?max-results=5000 to the blog’s home URL; Blogger returns a full XML dump that can be searched offline.

  • Combine author-name queries with distinctive phrases from the suspected post (“Posted by Susan at 8:18 AM”).

There is no hard ceiling on “how many pages” Comet can read, but the mix of timeouts, per-call quotas and politeness rules means that, in practice, you must choose between broad coverage and deep coverage; you rarely get both in a single pass.


I don't understand a lot of that. Learning curve.

No comments:

Post a Comment