TrellixTech.com
SMBmeta Specification Version 0.9 Proposal
Home | Weblog | Writings | SMBmeta | Others

SMBmeta specification

Version 0.9 draft 3/27/2003 1:39:35 PM

 

Introduction

 

SMBmeta is a small business directory entry format. It describes a language -- a set of tags -- intended to be found in a file at the top level of an Internet domain.

 

The name SMBmeta stands for "Small and Medium-sized Business metadata". Metadata is computer jargon that means literally "data about data". Here, the idea is that SMBmeta entries describe the business or businesses behind the website on which the SMBMeta file appears.

 

SMBmeta is dialect of XML. All SMBmeta files must conform to the XML 1.0 specification, as published on the World Wide Web Consortium (W3C) website. SMBmeta is associated with the XML Namespace "http://www.smbmeta.org/namespace/v0.9". (Namespaces are a feature of XML that lets you name tags without worrying that two different groups will use the same tag name for different things. You don't need to understand namespaces to use SMBmeta.)

 

Details

 

An SMBmeta file is stored in the top level of a domain with the filename "smbmeta.xml". For example:

 

   http://www.concordeggplant.com/smbmeta.xml

 

The file is in XML format. It consists of one <smbmeta> element. That element contains one <business> element describing the business. In addition to some required and some optional elements within the <business> element, there must be one or more <location> elements.

 

Here is a sample SMBmeta file with the required and most of the optional components:

 

<?xml version="1.0" encoding="UTF-8"?>

<smbmeta version="0.9" xmlns="http://www.smbmeta.org/namespace/v0.9">

   <business domain="concordeggplant.com">

      <name>Concord Eggplant Restaurant</name>

      <description>Innovative vegetarian food for young and old.</description>

      <type naics="722110">Vegetarian restaurant</type>

      <type naics="722320">Party platters and catering for small events</type>

      <websiteDomain>www.concordeggplant.com</websiteDomain>

      <websiteCapabilities type="schedule" />

      <lastUpdated>Sat, 07 Sep 2002 19:30:01 EST</lastUpdated>

      <refreshDays>30</refreshDays>

      <link type="home" href="http://www.concordeggplant.com">Home page</link>

      <link type="contact" href="http://www.concordeggplant.com/contact.htm">Contact page</link>

      <generator>by hand</generator>

      <docs>http://www.smbmeta.org/docs</docs>

      <location country="us" postalCode="01742">

         <about>Our one and only location</about>

         <address href="http://www.concordeggplant.com/directions.html">300 Baker Avenue</address>

         <serviceRange area="local" />

         <languageSpoken language="en-us">English</languageSpoken>

         <hours day="all" open="1130" close="2130" timezone="local" />

         <parking type="on-street">Lots of metered on-street parking</parking>

         <publicTransportation type="train" blocksAway="3">5 minute walk from the commuter rail</publicTransportation>

      </location>

   </business>

</smbmeta>

 

Links to additional sample files can be found on the Samples Page. These are samples of the latest version of this specification as well as older versions that may still exist or be useful for historical purposes.

 

The style of XML chosen here is as follows: Attribute values are used, in most cases, when a value is chosen from a predefined set, and for "href" URL references. Tag text values are used for more free-format information. In many cases, there is free-format text available for a description of the more constrained attribute values. It is up to applications using this data to decide whether or not, and for which purpose, to use such descriptive text. In this version, all descriptive text must be plain text, without tagged markup. Special characters "<" and "&" must be escaped with entity values (i.e., "&lt;" and "&amp;") in text and in attributes. (This differs from HTML, where "hrefs" can contain those characters). Free-format text may be truncated by aggregation programs. It is highly likely that many aggregators will restrict most free-format text to 255 bytes or less (in UTF-8, some characters take up more than one byte).

 

SMBmeta files have the following components:

 

Element: <smbmeta>

 

Usage:

   <smbmeta version="v-num" xmlns="smbmeta-ns" debug="db-value">...</smbmeta>

 

Description:

Each SMBmeta file has one, and only one, <smbmeta> element. The value of version number v-num must be "0.9" ("proposed initial formal release"). This element must be declared to be in the SMBmeta namespace. You can use an xmlns attribute to do that by providing the smbmeta-ns namespace value of "http://www.smbmeta.org/namespace/v0.9". This element must have one <business> sub-element.

 

The debug attribute is optional. If present, the attribute value must be non-empty. The presence of this attribute indicates that the data in the file is being used for debugging purposes and should be ignored by search engines and other aggregation software. This lets developers test out new tools without worrying that their test data will show up in search results. Most users will never need the debug attribute.

 

Examples:

   <smbmeta version="0.9" xmlns="http://www.smbmeta.org/namespace/v0.9">

 

Element: <business>

 

Usage:

   <business domain="d-name">...</business>

 

Description:

Each <smbmeta> element (and therefore, each SMBmeta file) must have one, and only one, <business> element. If you want to describe more than one business, then you need to have a domain for each business. (This restriction is to discourage spamming the system. There is another data format derived from this specification, the SMBmeta Proxy Format, that allows multiple <business> elements.) Note that you may have more than one domain pointing to the same web site, leading to multiple domains for the same business. You may also have web sites that aren't at top level by having just an SMBmeta file at the top level of the domain with <link> elements that point elsewhere. Finally, a single business can be in multiple classifications through the use of multiple <type> elements.

 

The domain attribute is required and consists of the domain name value d-name of the top-level domain associated with this business. Aggregators can use the domain name as a key to assure that each domain has only one active SMBmeta file (case should be ignored). Sub-domain names, such as "www.concordeggplant.com", are not used for this attribute even if necessary to get to the web site -- that is handled by the <websiteDomain> element.

 

For most TLDs (Top Level Domains), such as .com, .org, and .net, there would be only two domain name components as part of the domain attribute value, the company's part and the TLD (e.g., concordeggplant and .com). With many Country Code TLDs, such as .uk, you will need an additional domain name part, so that you would have "concordeggplant.co.uk", or "concordeggplant.com.br". For others, like .tv, you do not need the extra part.

 

Examples:

   <business domain="concordeggplant.com">

 

Element: <name>

 

Usage:

   <name>name-of-business</name>

 

Description:

The <name> element is used to provide the name of the business. It is how people refer to the business. This can be a corporate name, a DBA name, or a nickname. The <description> element is used for more specific information. Do not include in the <name> element tag-lines, like "Home of the mega veggie roast". Just use the business name. It is assumed that these will not be unique. While they may be tradenames and trademarks, they may just be common names. They should not, though, be purposely misleading (which purposeful misuse of another's trademark would be). They would be the name used on correspondence, when answering the phone, on signage and advertising, etc.

 

This element is a sub-element of the <business> element. There may be only one, and it is a required element.

 

Examples:

   <name>Concord Eggplant Restaurant</name>

   <name>Joe Smith, dba Floristoria</name>

   <name>Berman &amp; Sons, Inc.</name>

   <name>Kate Hobson, carpenter</name>

   <name>Kate Hobson</name>

 

Element: <description>

 

Usage:

   <description language="l-code">business-description</description>

 

Description:

The <description> element is used to provide more descriptive information about the business than the <name> element. There may be more than one <description> element, though each must have a different language attribute value. The language attribute specifies the language of the description as value l-code. The allowable values for l-code are listed below in the Special Values section. If a <description> element has no language attribute, then it must be the only <description> element. The <description> element is optional.

 

The length of the business-description text value should be anywhere from a simple phrase to a short paragraph. Listings of information gleaned from SMBmeta files may appear in fixed format reports, so try to write the text so that it is tolerant of being truncated and still provides useful information. (It is likely that many systems will allow no more than 255 characters.) Think of the tag lines about businesses on public radio ("Concord Eggplant: A family style restaurant serving great vegetarian food to the Metrowest area for 20 years") to get an idea of getting a lot of information into a short sentence. If the text is one sentence or less, then it does not need a period at the end. If it is more than one sentence in length, then each sentence should end with a period. Descriptive, informative text is best. Excessive puffery and marketing hype is not appropriate. Exclamatory punctuation is rarely appropriate, and may be removed by some listing programs. (Marketing hype can always be used on the web site if you feel it is necessary and effective.) Leading text (e.g., "Our brand selection will surprise you!") is not appropriate unless specific to the business (e.g., "Our brand selection varies depending upon the inventory being liquidated").

 

Examples:

   <description language="en-us">Something in English</description>

   <description language="es">Something in Spanish</description>

   <description>Public relations for small businesses in the fashion industry. Specializing in product launches in children’s clothing.</description>

 

Element: <type>

 

Usage:

   <type naics="bn-num">type-description</type>

   <type system="system-type" code="type-value">type-description</type>

 

Description:

Each business must be classified as being of one or more types using the <type> element. The business type is specified by using a naics attribute denoting the business type from the same list used by the U.S. Census, the North American Industry Classification System (NAICS). The values for the naics attribute are listed below in the Special Values section. The codes allow varying levels of specificity. Use what you feel is most helpful to customers.

 

For businesses in countries where other business classification systems are used by the government, the second format may be used. In that format, the system attribute specifies the classification system and the code attribute specifies the business type using that classification. A list of National Classifications can be found on the United Nations Statistics Division web site, http://unstats.un.org/unsd/cr/ctryreg/default.asp?Lg=1.

 

The text value type-description is optional and can be used to be more specific about the type of business. It should be just a simple phrase at most.

 

There must be at least one <type> element as a sub-element of a <business> element. Multiple <type> elements are allowed if a business is of more than one type. For example, some food establishments do takeout, sit down, and catering, as well as run a cooking school, each of which has its own NAICS code value.

 

At least one <type> element must be in the naics format, even when other <type> elements are in the other, explicit system format. This is for compatibility purposes among aggregators.

 

Examples:

   <type naics=”541820”>Public relations</type>

   <type naics=”238350”>Carpenter specializing in oak cabinets</type>

   <type naics=”722211”/>

   <type system="naf" code="322A">Fabrication d'equipements d'emission et de transmission hertzienne</type>

 

Element: <websiteDomain>

 

Usage:

   <websiteDomain>domain-name</websiteDomain>

 

Description:

The <websiteDomain> element specifies the full domain name of the web site where the SMBmeta file can be found. There may be more than one <websiteDomain> element if there are more than one domain name pointing to the same website (e.g., "www.softgarden.com" and "www.softwaregarden.com"). There must be at least one <websiteDomain> element within each <business> element, and at least one of them must have the <business> element's domain attribute value as part (or all) of its domain-name value. The domain-name value does not have a scheme name (i.e., no "http://") and should work for finding the web site if "http://" is placed before it. If you have more than one domain, you should have more than one <websiteDomain> element.

 

The <business> element's domain attribute value is used as a unique key for database use of the SMBmeta information, and as a means for connecting the information to a particular entity for responsibility (the domain name owner). The <websiteDomain> element is used to list the websites that may bring you to the SMBmeta file and that may represent the business (i.e., those that respond to browser and other HTTP requests on port 80). The SMBmeta file can also use the <link> element to indicate the URL of a particular page on a business' website (e.g., the "home" or "contact us" page), or a relevant page on another website.

 

Domain names of websites that do not have the smbmeta.xml file at root level should not be listed with the <websiteDomain> element. Those websites may be listed with <link> elements.

 

Examples:

   <websiteDomain>www.concordeggplant.com</websiteDomain>

   <websiteDomain>eggplant-to-go.com</websiteDomain>

   <websiteDomain>www.eggplant-to-go.com</websiteDomain>

 

Element: <websiteCapabilities>

 

Usage:

   <websiteCapabilities type="wsc-type">capability-description</websiteCapabilities>

 

Description:

The <websiteCapabilities> element is used to indicate various capabilities that the business' website provides. There may be more than one <websiteCapabilities> element as a sub-element of the <business> element. There may also be no <websiteCapabilities> elements.

 

The type attribute is used to specify the particular capability provided. The wsc-type attribute value may be any of the following:

 

  • purchase - the website is enabled for e-commerce, taking orders directly using a browser
  • schedule - the website has a facility for checking a schedule and making reservations directly using a browser
  • query - the website has a form for requesting further information using a browser

 

The optional capabilities-description text can be used to provide a simple phrase describing the capability if necessary.

 

Examples:

   <websiteCapabilities type="purchase">Full catalog available online</websiteCapabilities>

   <websiteCapabilities type="purchase">Partial catalog available online</websiteCapabilities>

   <websiteCapabilities type="query" />

 

Element: <lastUpdated>

 

Usage:

   <lastUpdated>dtm</lastUpdated>

 

Description:

The <lastUpdated> element indicates the last time this file was changed. All date-times in SMBmeta conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred). The <lastUpdated> element is a sub-element of the <business> element, and is optional. If present, there may only be one.

 

Examples:

   <lastUpdated>Mon, 23 Sep 2002 16:56 EST</lastUpdated>

 

Element: <refreshDays>

 

Usage:

   <refreshDays>r-days</refreshDays>

 

Description:

The <refreshDays> element specifies the number of days after the <lastUpdated> time after which the user of the data in the SMBmeta file may need to refetch the data to check for possible changes. This is a hint to data aggregators. This is not the refresh period of the website itself, which may be more or less frequent. The <refreshDays> element is optional, but if present, there must also be a <lastUpdated> element, and there may only be one.

 

The r-days value must be an integer from 0 to 999. The value "0" means that the file is subject to change at any time, and that the reader should know that. The value "999" means that the file is unlikely to change, and a yearly check would be sufficient. If there is no <refreshDays> element, or there is no r-days value, then "30" is assumed, meaning that the file should be checked monthly. If no <lastUpdated> value is present, then the current time, or the date/time modified returned by the server (whichever is earlier) may be used.

 

The value is given in days to emphasize that this file contains relatively static information. It is not used for up-to-date inventory status. The most common changes would be opening a new location, changing hours, or adding a new line of business, all things rarely done more than a few times a year.

 

Despite this value, an aggregator may check for changes more or less frequently.

 

Examples:

   <refreshDays>1</refreshDays>

   <refreshDays>30</refreshDays>

 

Element: <link>

 

Usage:

   <link type="l-type" href="link-URI">link-description</link>

 

Description:

The <link> element is used to indicate relevant Internet resources. The type attribute specifies the type of resource linked to, and the href attribute specifies the URI (a URL for a web page is a type of URI). The optional text value link-description provides more information than the basic type.

 

The currently allowed values for the type attribute value l-type are:

 

  • home -- a link to the home page of a website for this business
  • about -- a link to a webpage describing the business in greater detail
  • weblog -- a link to a weblog maintained by this business
  • rss -- a link to an RSS-format file describing weblogs or content-feeds from this business
  • contact -- either a link to a webpage or a mailto link
  • directory -- a link to a directory where this business is listed that people finding this business may find useful
  • parent -- a link to the website of the "parent" company of this company.
  • franchisor -- a link to the website of the company of which this company is a franchisee. The link-description text value should be something like "Company-name franchisee".
  • other -- the URI of a resource not listed above

 

If the type of resource you wish to link to is not in this list, use “other” and specify it using the link-description text value.

 

The "directory" type is a hint to searchers about where they may find other similar businesses. While many businesses would not like to link to directories of potential competitors, there are many others that would. Also, directories that implement websites for businesses may want to provide links back to themselves.

 

The href attributes must contain valid URIs, complete with scheme name. Allowed schemes are "http://" and "mailto:". The contact type is one that can have either an "http://" link or a "mailto:" link. Because of the spammer harvesting possibilities of a "mailto:" link, you may want the contact link to point to a "Contact Us" page with either only human readable email addresses or contact forms that result in immediate emails.

 

The <link> element ties the data in the SMBmeta file to the business' website, and other resources like email addresses, forms, or data files. There may be multiple <link> elements, including multiple ones with the same type attribute. For example, a business may have two web sites, one under its own domain and another as a sub-page on an industry-specific directory.

 

The <link> element is an optional sub-element of the <business> element.

 

Examples:

   <link type="home" href="http://www.concordeggplant.com">Home page</link>

   <link type="contact" href="http://www.concordeggplant.com/contact.htm">Contact page</link>

 

Element: <affirmation>

 

Usage:

   <affirmation href="affirmation-URI" signature="s-value" />

 

Description:

The <affirmation> element is used to provide information that may be helpful in determining the truth of the data provided. This lets the SMBmeta file author give hints to a search engine or directory about where it might find the domain listed. If the search engine considers that "Affirmation Authority" authoritative, then it could check on the domain through a query of the authority, or use the optional signature to check algorithmically on the authenticity of the claim of being affirmed. The algorithm that uses the signature s-value, combined with other data, is determined by the particular Affirmation Authority. The s-value is most likely in Base64 encoding (see RFC 2045). This element is a sub-element of the <business> element, and is optional. There may be any number of <affirmation> elements.

 

The values for this element would most likely be provided by the Affirmation Authority. Unless you are working with one of them, this element can be ignored, both in authoring and in reading.

 

There will be Affirmation Authorities that do not expect to have <affirmation> elements in all of the SMBmeta data that they affirm. For example, a "List of Bad SMBmeta File Domains" Affirmation Authority would not expect the "black-listed" domains to point back to the Affirmation Authority that says they should not be trusted.

 

Example:

   <affirmation href="http://www.spamdetecting.org/smbmeta/name"

      signature="BXz/AhR5I8XnxaCxxO0KrPn/Sof8ObMnAg==" />

 

Element: <generator>

 

Usage:

   <generator>gen-name</generator>

 

Description:

The <generator> element is used to specify the program or method used to generate the SMBmeta file. If no generator is used, then the gen-name “by hand” may be used. This is a sub-element of the <business> element, and is optional. If present, there may only be one. This element is similar to the one in RSS 2.0.

 

Examples:

   <generator>MightyInHouse Content System v2.3</generator>

   <generator>by hand</generator>

 

Element: <docs>

 

Usage:

   <docs>doc-url</docs>

 

Description:

The <docs> element gives a full URL to help find the documentation for the file. It is a sub-element of the <business> element and is optional. If present, there may only be one. The initial value is “http://www.smbmeta.org/docs”. This element is similar to the one in RSS 2.0.

 

Examples:

   <docs>http://www.smbmeta.org/docs</docs>

Element: <location>

 

Usage:

   <location country="c-code" postalCode="p-code" main=”yes/no”>...</location>

 

Description:

The <business> element must have one or more <location> sub-elements. The <location> element indicates the location of the business. It has optional sub-elements that specify information such as the area served and the hours open for business.

 

The country attribute specifies the country in which the location resides. The country code value c-code must be one of the ISO 3166 values (http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html) – e.g., the United States is “us”, Canada is “ca”, etc.

 

The postalCode attribute gives the postal code (using the scheme for the specified country) of the location. The country is assumed to be “us” if omitted, and the postalCode is required. In the United States, the postalCode may be either the 5 digit ZIP code or the 9 digit plus hyphen ZIP+4 (see http://www.usps.com/zip4/zipfaq.htm for more information about ZIP codes). If there is no postal code in the country for the location, then the city name should be used. (The city name may not be used in the United States.)

 

The main attribute specifies whether or not this is the “main office” or headquarters in a multi-location business. Its values may be either “yes” or “no”. It is optional.

 

Examples:

   <location country=”us” postalCode=”01742” main=”yes”>

   <location postalCode=”01742-2131”>

 

Element: <about>

 

Usage:

   <about>about-text</about>

 

Description:

The <about> element is an optional sub-element of the <location> element. It lets you provide descriptive text about the location. If present, there may only be one. This is not a description of the business (that is provided by the <description> element) -- it is a description of what is distinctive about the location. It is also not travel directions to the address, which is provided by the <address> element.

 

Examples:

   <about>This auxiliary office shares space with Industrial Rent-a-Car</about>

   <about>Our original, and still our only, location</about>

 

Element: <address>

 

Usage:

   <address href=”directions-url”>addr-text</address>

 

Description:

The <address> element is an optional sub-element of the <location> element. If present, there may only be one. It gives the street address of the location. The city and postal code information need not be provided, since that can be determined from the <location> element’s postalCode value. Together with the postalCode, the addr-text should be sufficient to use with a mapping system.

 

The optional href attribute is used to provide the URL of a web page with more specific information, such as driving directions, a map, etc.

 

The addr-text value is optional. Businesses may not want their address to be easily harvested for physical mail spam, and may choose to leave it off. If empty, then the href attribute is required.

 

Examples:

   <address href=”http://www.concordeggplant.com/directions.html”>300 Baker Avenue</address>

   <address href=”http:// .concordeggplant.com/directions.html”/>

   <address>300 Baker Avenue</address>

 

Element: <coordinates>

 

Usage:

   <coordinates latitude="lat-value" longitude="long-value">