From the makers of Vitamin The Future of Web Apps

Sep 13-14, San Francisco

only $295

Speakers include: Kevin Rose (Digg), Tom Coates (Yahoo!), Jeff Veen (Google), Michael Arrington (TechCrunch), Matt Mullenweg (WordPress) and more!

Sign up today

Features

Features > Design

Heard of the semantic web? Using Microformats everyone can contribute to the richness of the web. John Allsopp explains how.

Think back to when you first began using the web. It might be a couple of years, probably five, ten or even more. How did you use the web back then? More likely than not, you searched using Google, or Lycos, or Alta Vista (yeah, I’m old, I remember when Alta Vista came along and made web search actually work), or if you are really old like me, you’ll have used a directory - back in the day when human editors could look at all the sites in the world, and manually classify them into directories like Yahoo!, LookSmart, or the open directory project. That didn’t seem to scale past a few million sites.

But our usage patterns then and now are largely the same. Search engines indexed pages based on page content, and other measures of relevance; users searched using keywords, and looked perhaps at the first 5, 10, or at a pinch 20 results. And this is largely how we use the web today. We seek out atomic pieces of information - like pages, and we read them.

But the pages on the web aren’t simply discrete islands of information. Often they contain very similar information - from mundane things like contact details for individuals and organizations, to reviews of films, restaurants, books, to opinions about what’s worth reading online. Yet aggregating all those contacts details to create a distributed Yellow pages-like service, or those opinions about restaurants, books, films, gadgets to create decentralized versions of IMDB reviews, or Amazon books reviews, or Zagats, is very hard, because there are no standardized formats on the web for contact details, or reviews, and many many other kinds of commonly published information. The wisdom of the web, the collective opinions and attitudes expressed there are largely inaccessible, because software isn’t smart enough to recognize it.

Say you want to sell your car. How do we go about doing that? Like with IMDB for film reviews, or Amazon for book reviews we find a centralized place to post a classified listing - or maybe several. Each requiring our effort and time to fill in details, each possibly requiring some form of payment. Each a walled garden.

But what if we could somehow post this listing to our blog, and then easily let services which cared about classifieds listings know that there is a new or updated classified at my site. The missing piece that would enable this is a standard format (after all html doesn’t have a <classified> element).

I’m sure you can easily imagine all kinds of situations like these, where centralized solutions are required, simply because there is no standardized, semantic way of marking up the data required for a distributed or devolved system to work.

So data stays locked up in proprietary databases, despite it effectively being public data. Yellow pages-style directories are jealously guarded by their compilers even though it’s our details which are being kept. Amazon owns your reviews, and reserves the right to edit or delete them. And all this cool data that could be being mashed up in amazing ways, is largely inaccessible.

The open source effect

The last decade or so has seen a growing awareness of the importance of open source in the software world, and open document formats (standards like HTML and CSS and PNG), but we are only just beginning to understand the importance and power of open data. Plenty of open data, published under creative commons and other liberal licenses is available, but we can publish all the data we like, without a way of making it accessible to software via meaningful formats it’s of little useful value.

But where will these formats, for reviews, for classifieds listings, for citations, and as yet undreamt of uses come from?

  • Do we wait for the W3C to give us XHTML2, complete with all the semantics we might ever need? It seems a far fetched hope. A foolish one in reality.
  • Do we all invent new XML languages for our specific needs - like reviewML, classifiedML? That seems overkill for discrete chunks of information that lots of people are already publishing as it is.
  • Do we need all of today’s web developers, who let’s face it often don’t even do HTML all that well, to become XML developers. That’s hardly realistic.

But if we think for a moment about how the web has developed so successfully - it’s been incremental, evolutionary, not revolutionary - we might find a way forward that doesn’t break our tools and browsers and force developers to learn a whole new set of skills and concepts. And this is the approach that microformats take to developing simple, open web data formats built on existing data formats and existing developer practices to enable and encourage decentralized development, content and services.

So let’s just take a quick look at what microformats are and aren’t, then we’ll look at some very significant organizations already using them. Then we’ll take a look at just how easy it is to use them.

What microformats are

microformats are

  • simple
  • HTML based
  • data formats
  • based on existing standards
  • based on current developer practice

to bring richer semantics to today’s web, and enable decentralized services

  • without breaking browsers
  • without breaking tools
  • without requiring massive changes to developer knowledge

What microformats aren’t

Microformats aren’t a whole new language. They aren’t a whole new way of developing for the web. They don’t try to solve all the web’s problems. And they certainly aren’t hard to use. If you use valid HTML or XHTML, and have used the class and id attributes, then you have all the know how you need to use microformats right now. And if you don’t, a few minutes looking at some of these articles about the proper use of these attributes will get you up to speed

OK, but who is actually using microformats?

Digital Web magazine recently published an article of mine which has a round up of what big and small publishers, like Yahoo! and Cork’d are doing with microformats, what support for microformats is available in tools like DreamWeaver, WordPress, MoveableType, and Drupal, and what aggregators like Technorati and Edgeio are doing with microformatted content on the web. So rather than rehash that here, take a look at the state of the art in mid 2006. You’ll most likely be surprised at just how widely used they are.

So how do we use them? It’s hard right?

If you can use HTML, microformats are simple. I mentioned that there are tools and plugins for some of the web’s most popular content development and design tools, but let’s take a look at how to hand code a really commonly used, and quite sophisticated microformat, hCard

One of the most common pieces of information on the web is contact information for people and companies.

Here is how we might typically mark up a company’s contact details, taken from the original markup for our conference company, Web Directions


<div id="company">
	<p>Web Directions Conference Pty Ltd</p>
	<div id="address">
		<p>8/54 Mitchell St</p>
		<p>Bondi NSW 2026</p>
		<p >Australia</p>
	</div>
	<div id="phone">
		<p>Phone/Fax: 61 2 9365 5007</p>
		<p>Email: <a href="mailto:info@...">info@...</a></p>
	</div>
</div>

But, we can add a little microformatty goodness, and make this much more machine readable.

We use the hCard micrformat - which for the geeks in the audience is just vCard, a format we all use probably without knowing it, for address books, in HTML.

Here is how easy it is. First, we want to say that this whole construct is an hCard, so we use a class value on our root element of vcard


<div id="company" class=”vcard”>
	<p>Web Directions Conference Pty Ltd</p>
	<div id=”address”>
		<p>8/54 Mitchell St</p>
		<p>Bondi NSW 2026</p>
		<p >Australia</p>
	</div>
	<div id=”phone”>
		<p>Phone/Fax: 61 2 9365 5007</p>
		<p>Email: <a href=”mailto:info@…”>info@…</a></p>
	</div>
</div>

Now, we’ll designate the name of the organization for this hCard


<div id="company" class=”vcard”>
	<p class=”fn org”>Web Directions Conference Pty Ltd</p>
	<div id=”address”>
		<p>8/54 Mitchell St</p>
		<p>Bondi NSW 2026</p>
		<p >Australia</p>
	</div>
	<div id=”phone”>
		<p>Phone/Fax: 61 2 9365 5007</p>
		<p>Email: <a href=”mailto:info@…”>info@…</a></p>
	</div>
</div>

Where fn is vcard’s way of saying the display (’formatted’) name, and org the organization - vcard comes from the prehistoric times of the early 1990s, and back then we were really concerned with wasting bytes, so almost everything gets abbreviated

Next we have the address bit, so we mark that up as follows


<div id="company" class=”vcard”>
	<p class=”fn org”>Web Directions Conference Pty Ltd</p>
	<div id=”address” class=”adr”>
		<p class=”street-address”>8/54 Mitchell St</p>
		<span class=”locality”>Bondi</span> <span class=”region”>NSW</span> <span class=”postal-code”>2026</span>
		<p class=”country-name”>Australia</p>
	</div>
	<div id=”phone”>
		<p>Phone/Fax: 61 2 9365 5007</p>
		<p>Email: <a href=”mailto:info@…”>info@…</a></p>
	</div>
</div>

Here we’ve had to add a little bit of HTML, the span elements, to create more semantic containers for our chunks of information - the various components of the address, such as region and postal code. All these class names, by the way are directly taken from the field names in the original vCard specification. So, the semantics defined by the fields of vCard become the semantics of hCard. We are ‘reusing existing formats and schemas’, rather than reinventing the wheel,a key principle of the microformats way.

Next we have the telephone and email parts


<div id="company" class=”vcard”>
	<p class=”fn org”>Web Directions Conference Pty Ltd</p>
	<div id=”address” class=”adr”>
		<p class=”street-address”>8/54 Mitchell St</p>
		<span class=”locality”>Bondi</span> <span class=”region”>NSW</span> <span class=”postal-code”>2026</span>
		<p class=”country-name”>Australia</p>
	</div>
	<div id=”phone”>
		<p>Phone/Fax: <span class=”tel”>+61 2 9365 5007</p>
		<p>Email: <span class=”email”><a href=”…”>info@…</a></p>
	</div>
</div>

And we are more or less done. We could now go and remove our own earlier ‘pseudo semantic’ markup like this


<div class="vcard">
	<p class="fn org">Web Directions Conference Pty Ltd</p>
	<div class="adr">
		<p class="street-address">8/54 Mitchell St</p>
		<span class="locality">Bondi</span> <span class="region">NSW</span> <span class="postal-code">2026</span>
		<p class="country-name">Australia</p>
	</div>
	<div>
		<p>Phone/Fax: <span class="tel">+61 2 9365 5007</p>
		<p>Email: <span class="email"><a href="...">...</a></p>
	</div>
</div>

And now we have a completely standardized hCard for the contact information for Web Directions.

And, as we mentioned, if you are even lazier there are tools for DreamWeaver, various blogging system and CMS like Wordpress and Textpattern, and the online hCard creator to do all the heavy lifting for you.

Some other common, and widely used microformats that are no harder to use than this, and which might well fit your existing development needs are

  • hCalendar - for all kinds of events - used by upcoming.org and eventful as their publishing formats
  • hReview - for reviews of all kinds of stuff - used by Yahoo! Local and Yahoo! Tech, and the really cool Cork’d among others
  • rel-tag, for tagging your pages - widely used by bloggers, aggregated by Technorati, with over 100 million tagged pages indexed
  • votelinks - for voting on stuff - See the cool experimental site http://folkse.de [check this]
  • XFN, the original microformat, for describing your relationships with people - friends, colleagues, whom you’ve met, and so on
  • hResume - for marking up your resume
  • hListing for classified listings, used by Edgeio, a real innovator in decentralized services

So what are you waiting for? Get out there and start marking up your content with Microformats. You’ll be adding to the semantic richness and usefulness of the the web, you’ll be bringing a new sophistication and consistency to your markup, and you’ll be helping the next great evolutionary step in the web - a genuinely semantic web of data, and not just pages.

Some further reading and resources

digg.com logo Like this article? Digg it!

24 Responses to “Add microformats magic to your site”

  1. Kendall says

    John.
    Great article. Thanks.One question. Do the search engines currently look for these microformats in their ranking algorithms?

    Not that that would be the only reason to use them, but I’m sure that would light the fire under some people.

  2. JP says

    I really like the concept of microformats and your explanation is very informative.

    The vCard example - to make this example more semantic, wouldn’t it be beneficial to use an <address> somewhere in there? That is what the <address> tag is for, right?

  3. Greeneo says

    Is there a reason why so many of them start with “h” ? hReview, hCalendar, etc.

    While this is neat, it seems like it could really be a pain for hand coding; having to put all those little spans and classes everywhere. The worst is the date/time format. Not changing existing standards sounds nice, but can we please just put some hyphens in there? I have no idea when “20050428″ is, but “2005-04-28″ suddenly becomes so much more readable.

  4. trovster says

    Just briefly looked over the example. Has the ID in the last example been removed for a reason?

    Does NSW mean New South Wales (guess)? If so, wouldn’t you be better off doing <abbr title="New South Wales" class="region">NSW</abbr>?

    For the email, the class email should go on the anchor itself, therefore there is no need for the extraneous span element in this case.

    Finally, in the example the telephone markup is incorrect. TEL requires two other values, type (cell, work, etc) and value (the actual number). There are clearer and correct examples on the Microformats wiki.

    @Greeneo: Microformats are designed to be human readable first, then machine readable. So, you can use hypens in either, but you could also do <abbr title=”20060825″>25th August</abbr>. However date examples aren’t shown on this article.

    @JP: Yeh, the examples on the wiki show better semantics with the <address> element, and uses spans (which can be styled as block) which is better than using paragraphs in my opinion.

  5. Brian says

    Greeneo, many of the microformats start with ‘h’ for two reasons.
    1) adding an arbitrary character will prevent collisions with any CSS style that might also use the name ‘review’ or ‘calendar’

    2) more importantly, the letter ‘h’ was chosen because it is a review in HTML, so it becomes ‘hReview’

  6. Ben Ward says

    Another excellent introduction to µF John, I like the step-by-step transformation of old to new code.

    Kendall – I’ve no awareness of anything as mainstream as Google indexing based on µf just yet (although I’d put money on the µf loving monkeys at Yahoo! making the first move when it happens), but Technorati has a Microformats based search engine at http://kitchen.technorati.com/search/

    JP – Use of the ADDRESS element is defined as being for the page authors/owners information, not any arbitrary address in a page. This means that in some cases use of ADDRESS is appropriate with hCards or parts of hCards, but you also have to keep in mind that ADDRESS is an inline element, not block level, which can throw quirky DOM rearrangement in web browsers if you try to use it as the container for the entire hCard. On my hCard on my blog, I have address elements around my inline email addresses and IM addresses, but that’s pretty much all it’s good for.

    Greeneo – The ‘h’ stands for HTML and is used to fit the convention set by the vCard and iCalendar standards they’re based on. The convention has largely stuck for other Microformats too.

    Although the date standard used doesn’t allow hyphens, the relevant Microformats specs say that hyphens may be included by authors and must then be stripped by processors. So you can hyphen separate your dates as you like, and a search engine/browser feature should strip them out again to maintain vcard compatibility.

  7. trovster says

    You might want to check out Tails, a Firefox extension which helps you find Microformats on pages. Drew McLellan has tools for parsing Microformats.

  8. David says

    Unfortunately, that hCard isn’t quite correct–a tel requires one or more types and a value, the email class should be applied directly to the link, and the region should probably be marked as an abbreviation (NSW stands for New South Wales, right?). See http://pastebin.com/775749

    Also, the intermixing of div/p/span could probably be improved (why are street-address and country-name in paragraphs, while locality/region are in span?).

    Greeneo: I think the h refers to HTML. Also, you may want to read the thread on date-time formats at http://microformats.org/discuss/mail/microformats-discuss/2006-August/005211.html

  9. Voilà tout! » Méli-Mélo says

    […] Add microformats magic to your site […]

  10. Nice intro to incorporating microformats at Like It Matters says

    […] John Allsopp has a nice intro piece on using microformats over at Vitamin:  Add Microformats Magic to Your Site. And, as we mentioned, if you are even lazier there are tools for DreamWeaver, various blogging system and CMS like Wordpress and Textpattern, and the online hCard creator to do all the heavy lifting for you. Some other common, and widely used microformats that are no harder to use than this, and which might well fit your existing development needs are […]

  11. Ramon says

    Great article!!

  12. http://crabapple.cc says

    Add microformats magic to your site…

    For many of you, this’ll go right over your head….

  13. Brainspill » Are There Real Benefits To microFormats? says

    […] I keep up on developments in web design and semantic data structuring, so when I saw the rss version of http://www.thinkvitamin.com/features/design/how-to-use-microformats, I went and read the whole thing outside of my feed reader. I was hoping the article would clear up a few bothersome questions I’ve got about micro formats, but alas, disappointment. […]

  14. Marc’s Voice » Blog Archive » Later on in the day links says

    […] How to use microformats.  [via Kevin Lawver]  I’m just not sure if the way microcontent happens is for folks to hack tags into their pages. Seems pretty archaic and nerdy to me. But god bless them anyway.  We love microformats. […]

  15. Labnotes » Microformats helper for Ruby on Rails says

    […] 1. Start here, it’s a great introduction to microformats. Of head over to microformats.org. […]

  16. picture of John Allsopp John Allsopp says

    @Kendall

    “Do the search engines currently look for these microformats in their ranking algorithms?”

    In a general sense, no, but in a couple of specific instances, yes.

    Google uses the “rel=’nofollow’” microformat to not give any ranking to links which include the use of that microformat.

    Technorati has a search engine specifically for microformatted content

    http://kitchen.technorati.com/search/

    Edgeio searches for hListing microformatted classifieds listing

    Given that Yahoo! is extensively publishing extensively using hCard, hCalendar and hReview at local, tech, and upcoming, I suspect Yahoo might be interested in more general searching for microformatted content., but that is complete speculation on my part.

    Great question

    j

  17. Will Emerson says

    Wait a minute! I thought one of the main goals of CSS was to separate data from presentation. Isn’t this sneaking data back into presentation?

  18. picture of John Allsopp John Allsopp says

    JP

    “I really like the concept of microformats and your explanation is very informative.
    The vCard example - to make this example more semantic, wouldn’t it be beneficial to use an somewhere in there? That is what the tag is for, right?”

    Thanks for the nice words.

    Ah, address is a tricky one. Address is not generally for addresses, strangely enough, but according to the HTML spec for

    supply[ing] contact information for a document or a major part of a document such as a form

    So, for my contact details as an author of a page, yes, that’s appropriate, but generally, for any contact details, it’s not.

    This is one that had bitten almost all of us when we first think about semantic contact details.

    thanks again

    john

  19. picture of John Allsopp John Allsopp says

    @Greeneo

    Is there a reason why so many of them start with “h” ? hReview, hCalendar, etc.

    I blieve this naming convention comes from hCard, which is an HTML version of vCard.
    Hence, hCalendar (HTML version of iCalenndar, the IETF open calendaring format (used by iCal among others)), and then by analogy for new microformats like hReview (which is not based on a specific existing formmat or schema)

    While this is neat, it seems like it could really be a pain for hand coding; having to put all those little spans and classes everywhere. The worst is the date/time format. Not changing existing standards sounds nice, but can we please just put some hyphens in there? I have no idea when “20050428″ is, but “2005-04-28″ suddenly becomes so much more readable.

    hCard and hCalendar creator are your friends.

    My article on monday in Digital Web looks at these and other tools and plugins for doing the hard work!

    thanks again

    j

  20. picture of John Allsopp John Allsopp says

    @Trovster

    Just briefly looked over the example. Has the ID in the last example been removed for a reason?

    The senseof removing these was the original “pseudo semantic” markup of classes and ids was no longer necessary.

    Does NSW mean New South Wales (guess)? If so, wouldn’t you be better off doing NSW?

    That’s a good thought, but NSW is the standard postal name for the state, like PA or NY in the US, so that’s probably overkill.

    For the email, the class email should go on the anchor itself, therefore there is no need for the extraneous span element in this case.

    Thar’s correct.

    would be better as

    <p>Email:<a href="..." class="email">...</a></p>

    Finally, in the example the telephone markup is incorrect. TEL requires two other values, type (cell, work, etc) and value (the actual number). There are clearer and correct examples on the Microformats wiki

    Actually, these are optional not required values. It’s true that outlook has trouble with converted hCards if these values are left off, but I didn’t want to overly complicate the example

    thanks

    john

  21. Chris D says

    Nicely written. Very informative. i can see them everywhere now!

  22. Think Vitamin article by me at microformatique - a blog about microformats and “data at the edges” says

    […] Quick note on a Think Vitamin article just published on microformats. It’s a pretty detailed primer, including a worked example of converting reasonably complex, real workd contact details for a company into an hCard. […]

  23. David Madden says

    So thats what they are, now I get it.

    Great, thanks.

  24. picture of John Allsopp John Allsopp says

    Wait a minute! I thought one of the main goals of CSS was to separate data from presentation. Isn’t this sneaking data back into presentation?

    This is a quite commonly expressed objection to the way microformats uses class, but its based on a misunderstanding of the way the class attribute in HTML was designed. Yes, class is very commonly,and appropriately used by web designers in conjunction with CSS to style pages, and in truth, it is often overused for that, but despite this, class, according to the HTML specification “has several roles in HTML”, including “for general purpose processing by user agents”.

    Microformats utilize this second aspect of the class (and id) attribute, and do so legitimately. It is not an abuse of the class or id attribute to use it to add semantic context to a document. Nor is the use of class in and of itself presentational - in fact, it is an important mechanism for separating presentation from structured content.

Leave a Reply

Basic HTML (<strong>, <em>, <a>, etc.) is allowed in your comments. Please be respectful and keep your comments on-topic. If we think you're being offensive for no reason, we'll delete your comment.

Comments RSS