In this article I’ll discuss what we can achieve with HTML5 strictly from an engineering point of view. I’ll try to address the question “How a search engine can take advantage of HTML5 tags?”
Let’s go one by one.
At first, there seems to be no reason to use a Mark tag at all. We can achieve the same result with span tag, but only for the user. For a search engine it is important to identify which words are the words defining that page. Clearly there are hundreds or thousands of words in a web page. Most of them are common grammar words, pronouns or conjunctions and they generally do not define the page.
Search engines currently tend to look for the Title tag and URL string first; the obvious reason being that they are much more concise, clean and illustrative. They also look for header tags like h1 tag along with repeated words which tend to be more descriptive.
This being the case the importance of Mark tag is obvious. I have a word that defines the page; I can put it into a Mark tag and style it with CSS. Now both user and the search engine can see clearly that it depicts the page.
All websites have navigation mechanisms, either a top or a side menu or dynamic navigation which shows the navigation tree. The problem with search engines is there are many links in a given web page and it is hard to understand which of them the navigation links are.
There are methods to address this problem. Let’s say, if a given code block is repeated in many pages, has a root page link and it has links back to itself, then that block will most likely be the navigation section.
Why bother? HTML5 has the Nav element for representing the navigation section of the page explicitly. When you search in Google, you sometimes see the common pages of the result, like this:
These are the pages from our navigation links. But clearly this won’t pop up for every searched URL; also Google doesn’t know the navigations for all websites. We can help them; we can say “these are navigation links”.
This doesn’t only help us to have our navigation links shown on Google. A website, especially popular and big websites have huge numbers of pages. What do you think would happen if you put all of the content into the same page? Technically there really isn’t much difference, they all are on your hard drive, transferred over network, and a web crawler crawls all of them one by one or at once anyway.
The answer is categorizing. Search engines categorize your pages, your website, other pages, and other websites. You all probably already realized Google doesn’t show you the pages with exactly matched keywords, but others too. It doesn’t just match the words but also the presumed intent of it. If you search for “chemist shop” you will also get “pharmacy shop” results. Search engines also know abbreviations, related words etc.
So with the Nav Tag, we can categorize our website in such a way so that, search engines will understand what we are targeting and this is important because, for the most part, we are targeting multiple areas on the website, not just one.
A web page contains many elements other than the its main (human consumption) content. This includes the header, footer, promotion boxes etc. They are fragmented. We want the user to see everything, but we don’t want search engines to include everything when assessing the page. We may have a “Our website is re-constructed” promotion box at the right, but that doesn’t define the page. We want to point out that this page is created for particular content, and that content is this (the body you want people and the search engines to read). We can achieve this with Article element. We can place our content inside an Article element, or many Article elements if you want to categorize a single page. This way we can explicitly denote “This is the content this page is created for”.
As mentioned earlier, what would happen if we put all of our content into a single page? If we have the full Section tag support from search engines, we could actually do it. We can divide the page into sections with section elements. We can nest each section element within the Article element and include a unique header tag into each section.
We already have it, don’t we, the h1 element? While this is superior; we can also see the next iteration of it. We can place more text blocks, nest elements in <header> element, we can even nest h1 and h2 into it. Our headers don’t need to be just one line of text anymore. For those who don’t know about h1, it is the tag we generally put our title into, as opposed to the page title. The restriction is it only allows putting simple text blocks in it.
While a header generally shows more related content to the current page, a footer contains more generic information related the whole website. This can include the company information like address, phone, support email, policies etc. It is always good practice to categorize that information and label it with a “generic information” tag, in the footer element.
Sometimes the date of the page is of primary importance. The rise of Twitter has made real time searching more important, so the issue of time and date are obviously key issues.
How can a search engine determine the creation time of a web page? It is not when the search engine crawled it, obviously before. If web page contains a date like 11/12/2012 it is easy, but it may be one of the comments date. Another problem is date format differs for every culture.
Time element helps here to indicate the exact time of the web page creation. Creator of the web page can put explicitly define the date of the page in time tags.
We already discussed how to define the “real” content of a page. What about indicating related content? You probably have experienced visiting a web page after a search and noticing that your search text is on that little side box rather than above the content itself.
Most pages have a “Similar content” section. By making use of Aside Tag we can place related content so that the search engine can understand it is not the main content. A misguided content doesn’t help your SEO, so you can watch your side content out with Aside tag.
HTML5 has a lot of great implications for SEO. Share your experiences with us below or contact us.