Flash vs HTML in the Indexing Battle

By victoriak

We recently had a lively discussion here at Hothouse that was sparked by the mention of Flash website indexing. HTML has traditionally claimed superiority when it comes to Google’s ranking, but now Flash is fighting back and is walking the walk in a number of cases. Here’s (roughly) how the discussion flowed:

“I was just chatting with one of our colleagues and it turned out that they were unaware that Google includes Flash content as part of its searches.
Google does in fact search through Flash content – and does so much in the same way it searches through straight HTML files.Google has been doing this almost a year and a half now. You can read more about it here

“Google can index Flash content but the content does not rank well because you can’t apply the same tagging as you can with HTML … Think about how many times you’ve seen a flash element ranking in the top 10….”

“While that’s a fair point, I’m not sure it’s the full story.  For instance Google seems to handle searching PDFs (another Adobe product) just fine and PDFs contain none of the mentioned HTML mark-up.
I think what’s more relevant than it being HTML is the amount of text contained in the object.  Flash tends to be used for jobs that have rich visual and aural content and lots of interactivity – and a lot of search engine algorithms are very dependent on text (Google’s PageRank being an obvious exception).
I suggest it would be very hard to do a Google search and not have any of the results in the top 10 not contain any Flash.  Just  for a test I typed in ‘Coca Cola’ in Google and it returned these results.  I clicked on the first one and was very impressed from a Flash dev point of view.”

“The domain name is very strong in this search phrase which would push the coca-cola website to the top, try soft drink and see what you get.”

“With PDF being an ISO standard file format it’s much easier for Goog et.al. to give their apps understanding of the format.The 1.7 PDF spec defines various markup tags. For instance ‘headings’ are supported (H1 à H6) as are the PDF equivalent of HTML’s <p> and others.Combined with the fact PDF’s are still primarily a text document presentation format this makes it vastly easier to index and integrate PDF content into meaningful results for a given search.”

“PDFs were not always index-able by Google.  They certainly are now – including scanned text.  I would suggest Flash content is going/has gone down exactly the same path as PDFs.  It is in Google’s interest (and any other search engine) to make sites with Flash content as searchable as any others.”

“According to that previous blog post link they are working on expanding the engines abilities to perceive context and deep linking among other things. Recently they announced that they would support embedded text in flvs for example which was not available 1 year ago.This link has some suggestions for 2009 that look to be still valid.”

“Ultimately, Google’s going to get better at indexing it, and Adobe’s going to make it easier to index.
The issue is using it when appropriate for the UX. There’s so many flash and html nightmares out there that should be blocked from any form of indexing at all.”

“Adobe, Google and Yahoo are actively working on making the algorithms for flash indexing more powerful and flexible.

However, knowing when to stop and the recommended best practices for SEO are currently the best way to optimize the UX for flash enabled sites.”