SMBmeta specification
Version 0.1 draft 1/8/2003 2:32:03 PM
Introduction
SMBmeta is a small business directory entry format. It describes a language -- a set of tags --
intended to be found in a file at the top level of an Internet domain.
The name SMBmeta stands for "Small and Medium-sized Business metadata".
Metadata is computer jargon that means literally "data about data". Here, the idea is that SMBmeta entries describe the business
or businesses behind the website on which the SMBMeta file appears.
SMBmeta is dialect of XML. All SMBmeta files must conform to the XML 1.0 specification, as published
on the World Wide Web Consortium (W3C) website. SMBmeta is associated with the XML Namespace "http://www.smbmeta.org/namespace/v0.1".
(Namespaces are a feature of XML that lets you name tags without worrying that two different groups will use the same tag
name for different things. You don't need to understand namespaces to use SMBmeta.)
Details
An SMBmeta file is stored in the top level of a domain with the filename "smbmeta.xml". For example:
http://www.concordeggplant.com/smbmeta.xml
The file is in XML format. It consists of one <smbmeta> element. That element contains one
<business> element describing the business. In addition to some required and some optional elements within the <business>
element, there must be one or more <location> elements.
Here is a sample SMBmeta file with the required and most of the optional components:
<?xml version="1.0" encoding="UTF-8"?>
<smbmeta version="0.1" xmlns="http://www.smbmeta.org/namespace/v0.1">
<business domain="concordeggplant.com">
<name>Concord
Eggplant Restaurant</name>
<description>Innovative
vegetarian food for young and old.</description>
<type naics="722211">Vegetarian
restaurant</type>
<type naics="722320">Party
platters and catering for small events</type>
<websiteDomain>www.concordeggplant.com</websiteDomain>
<websiteCapabilities
type="schedule" />
<lastUpdated>Sat,
07 Sep 2002 19:30:01 EST</lastUpdated>
<refreshDays>30</refreshDays>
<link type="home"
href="http://www.concordeggplant.com">Home page</link>
<link type="contact"
href="http://www.concordeggplant.com/contact.htm">Contact page</link>
<generator>by
hand</generator>
<docs>http://www.smbmeta.org/docs</docs>
<location country="us"
postalCode="01742">
<about>Our
one and only location</about>
<address
href="http://www.concordeggplant.com/directions.html">300 Baker Avenue</address>
<serviceRange
area="local" />
<languageSpoken
language="en-us">English</languageSpoken>
<hours
day="all" open="1130" close="2130" timezone="local" />
<parking
type="on-street">Lots of metered on-street parking</parking>
<publicTransportation
type="train" blocksAway="3">5 minute walk from the commuter rail</publicTransportation>
</location>
</business>
</smbmeta>
Links to additional sample files can be found on the Samples Page. These are samples of the latest
version of this specification as well as older versions that may still exist or be useful for historical purposes.
The style of XML chosen here is as follows: Attribute values are used, in most cases, when a value
is chosen from a predefined set. Tag text values are used for more free-format information. In many cases, there is free-format
text available for a description of the more constrained attribute values. It is up to applications using this data to decide
whether or not, and for which purpose, to use such descriptive text. In this version, all descriptive text must be plain text,
without tagged markup. Special characters "<" and "&" must be escaped with entity values (i.e., "<" and "&")
in text and in attributes. (This differs from HTML, where "hrefs" can contain those characters).
SMBmeta files have the following components:
Element: <smbmeta>
Usage:
<smbmeta version="v-num" xmlns="smbmeta-ns"
debug="db-value">...</smbmeta>
Description:
Each SMBmeta file has one, and only one, <smbmeta> element. The value of version number v-num
must be "0.1" ("first experimental draft"). This element must be declared to be in the SMBmeta namespace. You can use an xmlns
attribute to do that by providing the smbmeta-ns namespace value of "http://www.smbmeta.org/namespace/v0.1". This element
must have one <business> sub-element.
The debug attribute is optional. If present, the attribute value must be non-empty. The presence
of this attribute indicates that the data in the file is being used for debugging purposes and should be ignored by search
engines and other aggregation software. This lets developers test out new tools without worrying that their test data will
show up in search results.
Examples:
<smbmeta version="0.1" xmlns="http://www.smbmeta.org/namespace/v0.1">
Element: <business>
Usage:
<business domain="d-name">...</business>
Description:
Each <smbmeta> element (and therefore, each SMBmeta file) must have one, and only one, <business>
element. If you want to describe more than one business, then you need to have a domain for each business. (This is an initial
restriction to discourage spamming the system, and a small number of <business> elements may be allowed in the future.)
Note that you may have more than one domain pointing to the same web site, leading to multiple domains for the same business.
You may also have web sites that aren't at top level by having just an SMBmeta file at the top level of the domain with <link>
elements that point elsewhere. Finally, a single business can be in multiple classifications through the use of multiple <type>
elements.
The domain attribute is required and consists of the domain name value d-name of the top-level domain
associated with this business. Aggregators can use the domain name as a key to assure that each domain has only one active
SMBmeta file (case should be ignored). Sub-domain names, like "www.concordeggplant.com" are not used for this attribute, even
if necessary to get to the web site.
Examples:
<business domain="concordeggplant.com">
Element: <name>
Usage:
<name>name-of-business</name>
Description:
The <name> element is used to provide the name of the business. It is how people refer to
the business. This can be a corporate name, a DBA name, or a nickname. The<description> element is used for more specific
information. Do not include in the <name> element tag-lines, like "Home of the mega veggie roast". Just use the business
name. It is assumed that these will not be unique. While they may be tradenames and trademarks, they may just be common names.
They should not, though, be purposely misleading (which purposeful misuse of another's trademark would be). They would be
the name used on correspondence, when answering the phone, on signage and advertising, etc.
This element is a sub-element of the <business> element. There may be only one, and it is
a required element.
Examples:
<name>Concord Eggplant Restaurant</name>
<name>Joe Smith, dba Floristoria</name>
<name>Berman & Sons, Inc.</name>
<name>Kate Hobson, carpenter</name>
<name>Kate Hobson</name>
Element: <description>
Usage:
<description language="l-code">business-description</description>
Description:
The <description> element is used to provide more descriptive information about the business
than the <name> element. There may be more than one <description> element, though each must have a different language
attribute value. The language attribute specifies the language of the description as value l-code. The allowable values for
l-code are listed below in the Special Values section. If a <description> element has no language attribute, then it
must be the only <description> element. The <description> element is optional.
The length of the business-description text value should be anywhere from a simple phrase to a short
paragraph. Listings of information gleaned from SMBmeta files may appear in fixed format reports, so try to write the text
so that it is tolerant of being truncated and still provides useful information. Think of the tag lines about businesses on
public radio ("Concord Eggplant: A family style restaurant serving great vegetarian food to the Metrowest area for 20 years")
to get an idea of getting a lot of information into a short sentence. If the text is one sentence or less, then it does not
need a period at the end. If it is more than one sentence in length, then each sentence should end with a period. Descriptive,
informative text is best. Excessive puffery and marketing hype is not appropriate. Exclamatory punctuation is rarely appropriate,
and may be removed by some listing programs. (Marketing hype can always be used on the web site if you feel it is necessary
and effective.) Leading text (e.g., "Our brand selection will surprise you!") is not appropriate unless specific to the business
(e.g., "Our brand selection varies depending upon the inventory being liquidated").
Examples:
<description language="en-us">Something
in English</description>
<description language="es">Something in
Spanish</description>
<description>Public relations for small
businesses in the fashion industry. Specializing in product launches in children’s clothing.</description>
Element: <type>
Usage:
<type naics="bn-num">type-description</type>
Description:
Each business must be classified as being of one or more types using the <type> element. The
business type is specified by using a naics attribute denoting the business type from the same list used by the Census, the
North American Industry Classification System (NAICS). The values for the naics attribute are listed below in the Special
Values section. The codes allow varying levels of specificity. Use what you feel is most helpful to customers.
The text value type-description is optional and can be used to be more specific about the type of
business. It should be just a simple phrase at most.
There must be at least one <type> element as a sub-element of a <business> element.
Multiple <type> elements are allowed if a business is of more than one type. For example, some food establishments do
takeout, sit down, and catering, as well as run a cooking school, each of which has its own NAICS code value.
Examples:
<type naics=”541820”>Public
relations</type>
<type naics=”238350”>Carpenter
specializing in oak cabinets</type>
<type naics=”722211”/>
Element: <websiteDomain>
Usage:
<websiteDomain>domain-name</websiteDomain>
Description:
The <websiteDomain> element specifies the full domain name of the web site where the SMBmeta
file can be found. There may be more than one <websiteDomain> element if there are more than one domain name pointing
to the same website (e.g., "www.softgarden.com" and "www.softwaregarden.com"). There must be at least one <websiteDomain>
element within each <business> element, and at least one of them must have the <business> element's domain attribute
value as part (or all) of its domain-name value. The domain-name value does not have a scheme name (i.e., no "http://").
The <business> element's domain attribute value is used as a unique key for database use of
the SMBmeta information, and as a means for connecting the information to a particular entity for responsibility (the domain
name owner). The <websiteDomain> element is used to list the websites that may bring you to the SMBmeta file and that
may represent the business (i.e., those that respond to browser and other HTTP requests on port 80). The SMBmeta file can
also use the <link> element to indicate the URL of a particular page on a business' website (e.g., the "home" or "contact
us" page), or a relevant page on another website.
Examples:
<websiteDomain>www.concordeggplant.com</websiteDomain>
<websiteDomain>eggplant-to-go.com</websiteDomain>
<websiteDomain>www.eggplant-to-go.com</websiteDomain>
Element: <websiteCapabilities>
Usage:
<websiteCapabilities type="wsc-type">capability-description</websiteCapabilities>
Description:
The <websiteCapabilities> element is used to indicate various capabilities that the business'
website provides. There may be more than one <websiteCapabilities> element as a sub-element of the <business>
element. There may also be no <websiteCapabilities> elements.
The type attribute is used to specify the particular capability provided. The wsc-type attribute
value may be any of the following:
- purchase - the website is enabled for e-commerce,
taking orders directly using a browser
- schedule - the website has a facility for checking
a schedule and making reservations directly using a browser
- query - the website has a form for requesting further
information using a browser
The optional capabilities-description text can be used to provide a simple phrase describing the
capability if necessary.
Examples:
<websiteCapabilities type="purchase">Full
catalog available online</websiteCapabilities>
<websiteCapabilities type="purchase">Partial
catalog available online</websiteCapabilities>
<websiteCapabilities type="query" />
Element: <lastUpdated>
Usage:
<lastUpdated>dtm</lastUpdated>
Description:
The <lastUpdated> element indicates the last time this file was changed. All date-times in
SMBmeta conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two
characters or four characters (four preferred). The <lastUpdated> element is a sub-element of the <business> element,
and is optional. If present, there may only be one.
Examples:
<lastUpdated>Mon, 23 Sep 2002 16:56 EST</lastUpdated>
Element: <refreshDays>
Usage:
<refreshDays>r-days</refreshDays>
Description:
The <refreshDays> element specifies the number of days after the <lastUpdated> time
after which the user of the data in the SMBmeta file may need to refetch the data to check for possible changes. This is a
hint to data aggregators. This is not the refresh period of the website itself, which may be more or less frequent. The <refreshDays>
element is optional, but if present, there must also be a <lastUpdated> element, and there may only be one.
The r-days value must be an integer from 0 to 999. The value "0" means that the file is subject
to change at any time, and that the reader should know that. The value "999" means that the file is unlikely to change, and
a yearly check would be sufficient. If there is no <refreshDays> element, or there is no r-days value, then "30" is
assumed, meaning that the file should be checked monthly. If no <lastUpdated> value is present, then the current time,
or the date/time modified returned by the server (whichever is earlier) may be used.
The value is given in days to emphasize that this file contains relatively static information. It
is not used for up-to-date inventory status. The most common changes would be opening a new location, changing hours, or adding
a new line of business, all things rarely done more than a few times a year.
Examples:
<refreshDays>1</refreshDays>
<refreshDays>30</refreshDays>
Element: <link>
Usage:
<link type="l-type" href="link-URI">link-description</link>
Description:
The <link> element is used to indicate relevant Internet resources. The type attribute specifies
the type of resource linked to, and the href attribute specifies the URI (a URL for a web page is a type of URI). The optional
text value link-description provides more information than the basic type.
The currently allowed values for the type attribute value l-type are:
- home -- a link to the home page of a website for this
business
- about -- a link to a webpage describing the business
in greater detail
- weblog -- a link to a weblog maintained by this business
- rss -- a link to an RSS-format file describing weblogs
or content-feeds from this business
- contact -- either a link to a webpage or a mailto
link
- directory -- a link to a directory where this business
is listed that people finding this business may find useful
- other – the URI of a resource not listed above
If the type of resource you wish to link to is not in this list, use “other” and specify
it using the link-description text value.
The "directory" type is a hint to searchers about where they may find other similar businesses.
While many businesses would not like to link to directories of potential competitors, there are many others that would. Also,
directories that implement websites for businesses may want to provide links back to themselves.
The href attributes must contain valid URIs, complete with scheme name. Allowed schemes are "http://"
and "mailto:". The contact type is one that can have either an "http://" link or a "mailto:" link. Because of the spammer
harvesting possibilities of a "mailto:" link, you may want the contact link to point to a "Contact Us" page with either only
human readable email addresses or contact forms that result in immediate emails.
The <link> element ties the data in the SMBmeta file to the business' website, and other resources
like email addresses, forms, or data files. There may be multiple <link> elements, including multiple ones with the
same type attribute. For example, a business may have two web sites, one under its own domain and another as a sub-page on
an industry-specific directory.
The <link> element is an optional sub-element of the <business> element.
Examples:
<link type="home" href="http://www.concordeggplant.com">Home
page</link>
<link type="contact" href="http://www.concordeggplant.com/contact.htm">Contact
page</link>
Element: <generator>
Usage:
<generator>gen-name</generator>
Description:
The <generator> element is used to specify the program or method used to generate the SMBmeta
file. If no generator is used, then the gen-name “by hand” may be used. This is a sub-element of the <business>
element, and is optional. If present, there may only be one. This element is similar to the one in RSS 2.0.
Examples:
<generator>MightyInHouse Content System
v2.3</generator>
<generator>by hand</generator>
Element: <docs>
Usage:
<docs>doc-url</docs>
Description:
The <docs> element gives a full URL to help find the documentation for the file. It is a sub-element
of the <business> element and is optional. If present, there may only be one. The initial value is “http://www.smbmeta.org/docs”.
This element is similar to the one in RSS 2.0.
Examples:
<docs>http://www.smbmeta.org/docs</docs>
Element: <location>
Usage:
<location country="c-code" postalCode="p-code"
main=”yes/no”>...</location>
Description:
The <business> element must have one or more <location> sub-elements. The <location>
element indicates the location of the business. It has optional sub-elements that specify information such as the area served
and the hours open for business.
The country attribute specifies the country in which the location resides. The country code value
c-code must be one of the ISO 3166 values (http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html) – e.g., the United States is “us”, Canada is “ca”, etc.
The postalCode attribute gives the postal code (using the scheme for the specified country) of the
location. The country is assumed to be “us” if omitted, and the postalCode is required. In the United States,
the postalCode may be either the 5 digit ZIP code or the 9 digit plus hyphen ZIP+4 (see http://www.usps.com/zip4/zipfaq.htm for more information about ZIP codes).
The main attribute specifies whether or not this is the “main office” or headquarters
in a multi-location business. Its values may be either “yes” or “no”. It is optional.
Examples:
<location country=”us” postalCode=”01742”
main=”yes”>
<location postalCode=”01742-2131”>
Element: <about>
Usage:
<about>about-text</about>
Description:
The <about> element is an optional sub-element of the <location> element. It lets you
provide descriptive text about the location. If present, there may only be one.
Examples:
<about>This auxiliary office shares space
with Industrial Rent-a-Car</about>
<about>Our original, and still our only,
location</about>
Element: <address>
Usage:
<address href=”directions-url”>addr-text</address>
Description:
The <address> element is an optional sub-element of the <location> element. If present,
there may only be one. It gives the street address of the location. The city and postal code information is not provided,
since that can be determined from the <location> element’s postalCode value. Together with the postalCode, the
addr-text should be sufficient to use with a mapping system.
The optional href attribute is used to provide the URL of a web page with more specific information,
such as driving directions, a map, etc.
The addr-text value is optional. Businesses may not want their address to be easily harvested for
physical mail spam, and may choose to leave it off. If empty, then the href attribute is required.
Examples:
<address href=”http://www.concordeggplant.com/directions.html”>300
Baker Avenue</address>
<address href=”http:// .concordeggplant.com/directions.html”/>
<address>300 Baker Avenue</address>
Element: <serviceRange>
Usage:
<serviceRange area="a-code">sr-text</serviceRange>
Description:
The <serviceRange> sub-element of the <location> element specifies the main area serviced
by this location. It is optional, and if present, there may only be one. It has a required area attribute with a value a-code
of one of:
- neighborhood -- something within walking distance
or a short drive, like a dry cleaner or gasoline station
- local -- within a purposeful but common driving distance,
such as within a town, like a specialized or highly rated restaurant
- citywide -- within
the distance of a large city or SMSA (standard metropolitan statistical area), like an airport or specialized clinic
- regional -- within a state or section of the country
(for example, a medium sized ski resort would be “regional”)
- national -- the entire country, like a mail-order
business or some consulting businesses
- international -- customers throughout the world, with
a large portion of business coming from outside the country where the business is located
- na – either not applicable (a business entity
that does not serve) or not provided. This is the default if there is no area attribute value or if this element is omitted.
Chose the most realistic value for a-code. While a carpenter in Ohio might be willing to travel
to Tokyo for the right price, that doesn’t make the carpenter “international” instead of “regional”
or “citywide”.
The optional text sr-text allows you to provide more description about the service range.
Examples:
<serviceRange area=”local”>Concord,
Maynard, and Acton</serviceRange>
<serviceRange area=”neighborhood”/>
Element: <languageSpoken>
Usage:
<languageSpoken language="l-code">lang-text</languageSpoken>
Description:
The optional <languageSpoken> sub-element of the <location> element specifies which
languages are spoken by the service providers at that location. The language attribute value l-code indicates which language.
The allowable values for l-code are listed below in the Special Values section. If more than one language is spoken, then
there should be multiple <languageSpoken> elements. The lang-text value can be used to be more specific or use the appropriate
language.
Examples:
<languageSpoken language=”en”/>
<languageSpoken language=”es”>Espaņol</languageSpoken>
Element: <hours>
Usage:
<hours day="d-code" open="o-time" close="c-time"
timezone="tz-code"/>
<hours>hours-text</hours>
<hours href=”h-url”/>