JENS MALMGREN I create, that is my hobby.

Not indexed but still missing? Google hypocrisy.

Having a blog means that you also maintain it. Here is the current status of the maintenance and a couple of interesting conclusions of what happens behind the scenes at Google.

If you followed this blog you know that it started on Blogger 10th of June 2011. On 28th of August I moved the blog to another platform. The main reason for this is that nudity is not allowed at blogger. The domain name stayed the same but the server and the blog system was new. By doing this the internal structure was changing and all blog posts got new links.

A couple of years back I had been done at this point but I learned a lot since then;  If you are unfriendly to Google then you are also unfriendly to your readers. So to fix this situation I started to create redirects. This works so that if you type in an old link then you are redirected to the new page. The most interesting question I had was “when is Google finished reading and indexing my site?” Well it is hard to get a definite answer but it looks like they found all errors now. So sometime between first of October and 10th of October 2011 they had indexed the entire site. That is 44 days. Or one month and 13 days. For 208 posts it is a little less than 5 pages per day. I expect that more “important” websites get a higher indexing rate.

A really important lesson here was that the robots.txt file that was active at blogger was not available at the new platform. At all places where you read about robots.txt you are informed it is a feature to make robots avoid pages. Became I more than surprised that Google started to report excluded pages as missing after 28th of August?!!?! How can you report something as missing that you have not indexed? So you have indexed the pages? This can be tested. If I make pages that are excluded by robots.txt with links to pages not excluded by robots.txt. I make sure that the non-excluded pages are not known anywhere. If the non-excluded pages show up in Google search results then this proves that I am correct; all content are indexed, even if it has been excluded by robots.txt. Voala!

Another issue was the alien keywords. That is words that originate from the blog system intended to provide nice and clever features but unintentionally they also add words that have nothing to do with the subject of the blog. On 10 September 2011 Google had found that the word ‘feed’ was used 13471 times on the blog according to Webmaster tools by Google. On that day I removed this word from the blog system. It was a text from an alt tag of an icon where you can subscribe to categories on the blog. There were 36 of these categories on every blog post page so this multiplies rapidly.

The most interesting thing here was that a couple of days later the same figure was 15000. This means that Google hold pages in a kind of pipeline after they have been harvested; picked up by googlebot. Today on October 16 it is reported to be found 652 times. That is 36 days later. I understand that every time I write about this issue I add a couple of these words but I expect the real count of the word to be something like 20. When will webmaster tools report 20?

On October 3rd I removed the word ‘permalink’ when it was found 1620 times. Now it is found 798 times.

So this was what I have to say about Google. Now some more art!

I was born 1967 in Stockholm, Sweden. I grew up in the small village Vågdalen in north Sweden. 1989 I moved to Umeå to study Computer Science at University of Umeå. 1995 I moved to the Netherlands where I live in Almere not far from Amsterdam.

Here on this site I let you see my creations.

I create, that is my hobby.