Avoid Accidental Spamminess
Be careful when building in automation of any kind into your blogs or websites–it will eventually bite you in the foot! For example, today I got an email from someone concerned that her prize keywords were being violated on a page of mine. It was bizarre, and unrelated to the content of the page, but since I used WP From / Where to automatically gather keywords from search engine visitors, someone visited once on that keyword, and it stuck. Obviously I added a filter to WP From / Where to exclude this word in the future, but I also tried to reduce the overall “spamminess” of the current design.
First, the number of “related posts” at the bottom has been reduced to taking 5 from Google Blog Search, excluding MSN, whose results were spurious at best, entirely. This gives the page a better, sleeker form factor.
Second, the number of links in the sidebar to Google queries on search terms used to get to the site has been reduced on some sites from 30 to 15 or 10. I have no idea why the number was so large–I think it was just to fill out the sidebar, because the content was longer with MSN related posts included. This is much better.
Third, the meta keywords tags that my emailer was so concerned about are now generated from the post tags, so that they are intrinsic to the content, and not the keywords used to get to that post.
This is all just a stopgap measure. My next design for the blog network is radically different, and should minimize or at least localize all external material to a well-defined area of the site. There won’t be any mixing of my content and other content, so impressions of spamminess should be reduced. Rather, the idea is to enrich my content and link it together in such a way that the site adds value within itself, without the need for bringing in more value from the outside.
Sitemaps now Google, Microsoft, and Yahoo supported
Did you ever think that Yahoo Search would adopt Google’s sitemaps service? Apparently the protocol is now its own standard. This means you can ping the big three at the following URLs:
- MSN: http://search.live.com/ ping?sitemap=sitemap_url
- Yahoo: https://siteexplorer.search. http://www.google.com/ webmasters/sitemaps/ ping?sitemap=sitemap_url

What does this mean?
First, it’s interesting that users get to tell a search engine what to search, how to index, and what’s been updated. Decoding that information used to be the sole responsibility of the search crawler–now it’s a webmaster’s configuration. Second, it’s a unified format that all the search engines can read. Perhaps there will now be better indexing from #2 Yahoo and #3 MSN as they try to catch up with Google. Whatever happens, more traffic is good for us bloggers
The Google Sandbox Effect Quantified
The Google Sandbox Effect, which is the phenomenon of a new and rising site suddenly being cut out of Google’s indices, is not a myth. It’s real, and can happen to your site.

If you look at the following graph, you’ll notice strong upward traffic trends until a certain arbitrary threshold was reached, the decision to sandbox my site made, and thereafter traffic (from Google) dropped off as the sandbox command propogated to all their search servers.
There seem to be three criteria for sandboxing at play here:
1) Your site must be above a certain traffic threshold to eliminate flagging tiny sites. I’m guessing that it’s about 10,000 hits over 7 days. If your site is not sustaining this kind of traffic, the sandbox algorithm won’t even consider it.
2) Your site must have a huge change in the number of documents on site. This site was a script which generated 1.17 million dynamic pages before it achieved popularity. I converted it to a blog with a few thousand pages, a reduction by a factor of -585. I suspect that this penalty applies to increases or decreases in the number of hosted pages.
3) Your site must be rising in popularity at a sufficiently high slope. There is no reason to penalize a site which grows slowly over years, because it cannot be a high-risk startup spam site.
If you want more good Google Sandbox analysis, there’s no better place to look than the SEOMox article 2005 Analysis of Google’s Sandbox. Even though it says “2005″ it’s not at all dated.