SMBmeta specification
Version 0.9 draft 3/27/2003 1:39:35 PM
Introduction
SMBmeta is a small business directory entry format. It describes a language -- a set of tags --
intended to be found in a file at the top level of an Internet domain.
The name SMBmeta stands for "Small and Medium-sized Business metadata".
Metadata is computer jargon that means literally "data about data". Here, the idea is that SMBmeta entries describe the business
or businesses behind the website on which the SMBMeta file appears.
SMBmeta is dialect of XML. All SMBmeta files must conform to the XML 1.0 specification, as published
on the World Wide Web Consortium (W3C) website. SMBmeta is associated with the XML Namespace "http://www.smbmeta.org/namespace/v0.9".
(Namespaces are a feature of XML that lets you name tags without worrying that two different groups will use the same tag
name for different things. You don't need to understand namespaces to use SMBmeta.)
Details
An SMBmeta file is stored in the top level of a domain with the filename "smbmeta.xml". For example:
http://www.concordeggplant.com/smbmeta.xml
The file is in XML format. It consists of one <smbmeta> element. That element contains one
<business> element describing the business. In addition to some required and some optional elements within the <business>
element, there must be one or more <location> elements.
Here is a sample SMBmeta file with the required and most of the optional components:
<?xml version="1.0" encoding="UTF-8"?>
<smbmeta version="0.9" xmlns="http://www.smbmeta.org/namespace/v0.9">
<business domain="concordeggplant.com">
<name>Concord
Eggplant Restaurant</name>
<description>Innovative
vegetarian food for young and old.</description>
<type naics="722110">Vegetarian
restaurant</type>
<type naics="722320">Party
platters and catering for small events</type>
<websiteDomain>www.concordeggplant.com</websiteDomain>
<websiteCapabilities
type="schedule" />
<lastUpdated>Sat,
07 Sep 2002 19:30:01 EST</lastUpdated>
<refreshDays>30</refreshDays>
<link type="home"
href="http://www.concordeggplant.com">Home page</link>
<link type="contact"
href="http://www.concordeggplant.com/contact.htm">Contact page</link>
<generator>by
hand</generator>
<docs>http://www.smbmeta.org/docs</docs>
<location country="us"
postalCode="01742">
<about>Our
one and only location</about>
<address
href="http://www.concordeggplant.com/directions.html">300 Baker Avenue</address>
<serviceRange
area="local" />
<languageSpoken
language="en-us">English</languageSpoken>
<hours
day="all" open="1130" close="2130" timezone="local" />
<parking
type="on-street">Lots of metered on-street parking</parking>
<publicTransportation
type="train" blocksAway="3">5 minute walk from the commuter rail</publicTransportation>
</location>
</business>
</smbmeta>
Links to additional sample files can be found on the Samples Page. These are samples of the latest
version of this specification as well as older versions that may still exist or be useful for historical purposes.
The style of XML chosen here is as follows: Attribute values are used, in most cases, when a value
is chosen from a predefined set, and for "href" URL references. Tag text values are used for more free-format information.
In many cases, there is free-format text available for a description of the more constrained attribute values. It is up to
applications using this data to decide whether or not, and for which purpose, to use such descriptive text. In this version,
all descriptive text must be plain text, without tagged markup. Special characters "<" and "&" must be escaped with
entity values (i.e., "<" and "&") in text and in attributes. (This differs from HTML, where "hrefs" can contain
those characters). Free-format text may be truncated by aggregation programs. It is highly likely that many aggregators will
restrict most free-format text to 255 bytes or less (in UTF-8, some characters take up more than one byte).
SMBmeta files have the following components:
Element: <smbmeta>
Usage:
<smbmeta version="v-num" xmlns="smbmeta-ns"
debug="db-value">...</smbmeta>
Description:
Each SMBmeta file has one, and only one, <smbmeta> element. The value of version number v-num
must be "0.9" ("proposed initial formal release"). This element must be declared to be in the SMBmeta namespace. You can use
an xmlns attribute to do that by providing the smbmeta-ns namespace value of "http://www.smbmeta.org/namespace/v0.9". This
element must have one <business> sub-element.
The debug attribute is optional. If present, the attribute value must be non-empty. The presence
of this attribute indicates that the data in the file is being used for debugging purposes and should be ignored by search
engines and other aggregation software. This lets developers test out new tools without worrying that their test data will
show up in search results. Most users will never need the debug attribute.
Examples:
<smbmeta version="0.9" xmlns="http://www.smbmeta.org/namespace/v0.9">
Element: <business>
Usage:
<business domain="d-name">...</business>
Description:
Each <smbmeta> element (and therefore, each SMBmeta file) must have one, and only one, <business>
element. If you want to describe more than one business, then you need to have a domain for each business. (This restriction
is to discourage spamming the system. There is another data format derived from this specification, the SMBmeta Proxy Format,
that allows multiple <business> elements.) Note that you may have more than one domain pointing to the same web site,
leading to multiple domains for the same business. You may also have web sites that aren't at top level by having just an
SMBmeta file at the top level of the domain with <link> elements that point elsewhere. Finally, a single business can
be in multiple classifications through the use of multiple <type> elements.
The domain attribute is required and consists of the domain name value d-name of the top-level domain
associated with this business. Aggregators can use the domain name as a key to assure that each domain has only one active
SMBmeta file (case should be ignored). Sub-domain names, such as "www.concordeggplant.com", are not used for this attribute
even if necessary to get to the web site -- that is handled by the <websiteDomain> element.
For most TLDs (Top Level Domains), such as .com, .org, and .net, there would be only two domain
name components as part of the domain attribute value, the company's part and the TLD (e.g., concordeggplant and .com). With
many Country Code TLDs, such as .uk, you will need an additional domain name part, so that you would have "concordeggplant.co.uk",
or "concordeggplant.com.br". For others, like .tv, you do not need the extra part.
Examples:
<business domain="concordeggplant.com">
Element: <name>
Usage:
<name>name-of-business</name>
Description:
The <name> element is used to provide the name of the business. It is how people refer to
the business. This can be a corporate name, a DBA name, or a nickname. The <description> element is used for more specific
information. Do not include in the <name> element tag-lines, like "Home of the mega veggie roast". Just use the business
name. It is assumed that these will not be unique. While they may be tradenames and trademarks, they may just be common names.
They should not, though, be purposely misleading (which purposeful misuse of another's trademark would be). They would be
the name used on correspondence, when answering the phone, on signage and advertising, etc.
This element is a sub-element of the <business> element. There may be only one, and it is
a required element.
Examples:
<name>Concord Eggplant Restaurant</name>
<name>Joe Smith, dba Floristoria</name>
<name>Berman & Sons, Inc.</name>
<name>Kate Hobson, carpenter</name>
<name>Kate Hobson</name>
Element: <description>
Usage:
<description language="l-code">business-description</description>
Description:
The <description> element is used to provide more descriptive information about the business
than the <name> element. There may be more than one <description> element, though each must have a different language
attribute value. The language attribute specifies the language of the description as value l-code. The allowable values for
l-code are listed below in the Special Values section. If a <description> element has no language attribute, then it
must be the only <description> element. The <description> element is optional.
The length of the business-description text value should be anywhere from a simple phrase to a short
paragraph. Listings of information gleaned from SMBmeta files may appear in fixed format reports, so try to write the text
so that it is tolerant of being truncated and still provides useful information. (It is likely that many systems will allow
no more than 255 characters.) Think of the tag lines about businesses on public radio ("Concord Eggplant: A family style restaurant
serving great vegetarian food to the Metrowest area for 20 years") to get an idea of getting a lot of information into a short
sentence. If the text is one sentence or less, then it does not need a period at the end. If it is more than one sentence
in length, then each sentence should end with a period. Descriptive, informative text is best. Excessive puffery and marketing
hype is not appropriate. Exclamatory punctuation is rarely appropriate, and may be removed by some listing programs. (Marketing
hype can always be used on the web site if you feel it is necessary and effective.) Leading text (e.g., "Our brand selection
will surprise you!") is not appropriate unless specific to the business (e.g., "Our brand selection varies depending upon
the inventory being liquidated").
Examples:
<description language="en-us">Something
in English</description>
<description language="es">Something in
Spanish</description>
<description>Public relations for small
businesses in the fashion industry. Specializing in product launches in children’s clothing.</description>
Element: <type>
Usage:
<type naics="bn-num">type-description</type>
<type system="system-type" code="type-value">type-description</type>
Description:
Each business must be classified as being of one or more types using the <type> element. The
business type is specified by using a naics attribute denoting the business type from the same list used by the U.S. Census,
the North American Industry Classification System (NAICS). The values for the naics attribute are listed below in the Special
Values section. The codes allow varying levels of specificity. Use what you feel is most helpful to customers.
For businesses in countries where other business classification systems are used by the government,
the second format may be used. In that format, the system attribute specifies the classification system and the code attribute
specifies the business type using that classification. A list of National Classifications can be found on the United Nations
Statistics Division web site, http://unstats.un.org/unsd/cr/ctryreg/default.asp?Lg=1.
The text value type-description is optional and can be used to be more specific about the type of
business. It should be just a simple phrase at most.
There must be at least one <type> element as a sub-element of a <business> element.
Multiple <type> elements are allowed if a business is of more than one type. For example, some food establishments do
takeout, sit down, and catering, as well as run a cooking school, each of which has its own NAICS code value.
At least one <type> element must be in the naics format, even when other <type> elements
are in the other, explicit system format. This is for compatibility purposes among aggregators.
Examples:
<type naics=”541820”>Public
relations</type>
<type naics=”238350”>Carpenter
specializing in oak cabinets</type>
<type naics=”722211”/>
<type system="naf" code="322A">Fabrication
d'equipements d'emission et de transmission hertzienne</type>
Element: <websiteDomain>
Usage:
<websiteDomain>domain-name</websiteDomain>
Description:
The <websiteDomain> element specifies the full domain name of the web site where the SMBmeta
file can be found. There may be more than one <websiteDomain> element if there are more than one domain name pointing
to the same website (e.g., "www.softgarden.com" and "www.softwaregarden.com"). There must be at least one <websiteDomain>
element within each <business> element, and at least one of them must have the <business> element's domain attribute
value as part (or all) of its domain-name value. The domain-name value does not have a scheme name (i.e., no "http://") and
should work for finding the web site if "http://" is placed before it. If you have more than one domain, you should have more
than one <websiteDomain> element.
The <business> element's domain attribute value is used as a unique key for database use of
the SMBmeta information, and as a means for connecting the information to a particular entity for responsibility (the domain
name owner). The <websiteDomain> element is used to list the websites that may bring you to the SMBmeta file and that
may represent the business (i.e., those that respond to browser and other HTTP requests on port 80). The SMBmeta file can
also use the <link> element to indicate the URL of a particular page on a business' website (e.g., the "home" or "contact
us" page), or a relevant page on another website.
Domain names of websites that do not have the smbmeta.xml file at root level should not be listed
with the <websiteDomain> element. Those websites may be listed with <link> elements.
Examples:
<websiteDomain>www.concordeggplant.com</websiteDomain>
<websiteDomain>eggplant-to-go.com</websiteDomain>
<websiteDomain>www.eggplant-to-go.com</websiteDomain>
Element: <websiteCapabilities>
Usage:
<websiteCapabilities type="wsc-type">capability-description</websiteCapabilities>
Description:
The <websiteCapabilities> element is used to indicate various capabilities that the business'
website provides. There may be more than one <websiteCapabilities> element as a sub-element of the <business>
element. There may also be no <websiteCapabilities> elements.
The type attribute is used to specify the particular capability provided. The wsc-type attribute
value may be any of the following:
- purchase - the website is enabled for e-commerce,
taking orders directly using a browser
- schedule - the website has a facility for checking
a schedule and making reservations directly using a browser
- query - the website has a form for requesting further
information using a browser
The optional capabilities-description text can be used to provide a simple phrase describing the
capability if necessary.
Examples:
<websiteCapabilities type="purchase">Full
catalog available online</websiteCapabilities>
<websiteCapabilities type="purchase">Partial
catalog available online</websiteCapabilities>
<websiteCapabilities type="query" />
Element: <lastUpdated>
Usage:
<lastUpdated>dtm</lastUpdated>
Description:
The <lastUpdated> element indicates the last time this file was changed. All date-times in
SMBmeta conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two
characters or four characters (four preferred). The <lastUpdated> element is a sub-element of the <business> element,
and is optional. If present, there may only be one.
Examples:
<lastUpdated>Mon,
23 Sep 2002 16:56 EST</lastUpdated>
Element: <refreshDays>
Usage:
<refreshDays>r-days</refreshDays>
Description:
The <refreshDays> element specifies the number of days after the <lastUpdated> time
after which the user of the data in the SMBmeta file may need to refetch the data to check for possible changes. This is a
hint to data aggregators. This is not the refresh period of the website itself, which may be more or less frequent. The <refreshDays>
element is optional, but if present, there must also be a <lastUpdated> element, and there may only be one.
The r-days value must be an integer from 0 to 999. The value "0" means that the file is subject
to change at any time, and that the reader should know that. The value "999" means that the file is unlikely to change, and
a yearly check would be sufficient. If there is no <refreshDays> element, or there is no r-days value, then "30" is
assumed, meaning that the file should be checked monthly. If no <lastUpdated> value is present, then the current time,
or the date/time modified returned by the server (whichever is earlier) may be used.
The value is given in days to emphasize that this file contains relatively static information. It
is not used for up-to-date inventory status. The most common changes would be opening a new location, changing hours, or adding
a new line of business, all things rarely done more than a few times a year.
Despite this value, an aggregator may check for changes more or less frequently.
Examples:
<refreshDays>1</refreshDays>
<refreshDays>30</refreshDays>
Element: <link>
Usage:
<link type="l-type" href="link-URI">link-description</link>
Description:
The <link> element is used to indicate relevant Internet resources. The type attribute specifies
the type of resource linked to, and the href attribute specifies the URI (a URL for a web page is a type of URI). The optional
text value link-description provides more information than the basic type.
The currently allowed values for the type attribute value l-type are:
- home -- a link to the home page of a website for this
business
- about -- a link to a webpage describing the business
in greater detail
- weblog -- a link to a weblog maintained by this business
- rss -- a link to an RSS-format file describing weblogs
or content-feeds from this business
- contact -- either a link to a webpage or a mailto
link
- directory -- a link to a directory where this business
is listed that people finding this business may find useful
- parent -- a link to the website of the "parent" company
of this company.
- franchisor -- a link to the website of the company
of which this company is a franchisee. The link-description text value should be something like "Company-name franchisee".
- other -- the URI of a resource not listed above
If the type of resource you wish to link to is not in this list, use “other” and specify
it using the link-description text value.
The "directory" type is a hint to searchers about where they may find other similar businesses.
While many businesses would not like to link to directories of potential competitors, there are many others that would. Also,
directories that implement websites for businesses may want to provide links back to themselves.
The href attributes must contain valid URIs, complete with scheme name. Allowed schemes are "http://"
and "mailto:". The contact type is one that can have either an "http://" link or a "mailto:" link. Because of the spammer
harvesting possibilities of a "mailto:" link, you may want the contact link to point to a "Contact Us" page with either only
human readable email addresses or contact forms that result in immediate emails.
The <link> element ties the data in the SMBmeta file to the business' website, and other resources
like email addresses, forms, or data files. There may be multiple <link> elements, including multiple ones with the
same type attribute. For example, a business may have two web sites, one under its own domain and another as a sub-page on
an industry-specific directory.
The <link> element is an optional sub-element of the <business> element.
Examples:
<link type="home" href="http://www.concordeggplant.com">Home
page</link>
<link type="contact" href="http://www.concordeggplant.com/contact.htm">Contact
page</link>
Element: <affirmation>
Usage:
<affirmation href="affirmation-URI" signature="s-value"
/>
Description:
The <affirmation> element is used to provide information that may be helpful in determining
the truth of the data provided. This lets the SMBmeta file author give hints to a search engine or directory about where it
might find the domain listed. If the search engine considers that "Affirmation Authority" authoritative, then it could check
on the domain through a query of the authority, or use the optional signature to check algorithmically on the authenticity
of the claim of being affirmed. The algorithm that uses the signature s-value, combined with other data, is determined by
the particular Affirmation Authority. The s-value is most likely in Base64 encoding (see RFC 2045). This element is a sub-element
of the <business> element, and is optional. There may be any number of <affirmation> elements.
The values for this element would most likely be provided by the Affirmation Authority. Unless you
are working with one of them, this element can be ignored, both in authoring and in reading.
There will be Affirmation Authorities that do not expect to have <affirmation> elements in
all of the SMBmeta data that they affirm. For example, a "List of Bad SMBmeta File Domains" Affirmation Authority would not
expect the "black-listed" domains to point back to the Affirmation Authority that says they should not be trusted.
Example:
<affirmation href="http://www.spamdetecting.org/smbmeta/name"
signature="BXz/AhR5I8XnxaCxxO0KrPn/Sof8ObMnAg=="
/>
Element: <generator>
Usage:
<generator>gen-name</generator>
Description:
The <generator> element is used to specify the program or method used to generate the SMBmeta
file. If no generator is used, then the gen-name “by hand” may be used. This is a sub-element of the <business>
element, and is optional. If present, there may only be one. This element is similar to the one in RSS 2.0.
Examples:
<generator>MightyInHouse Content System
v2.3</generator>
<generator>by hand</generator>
Element: <docs>
Usage:
<docs>doc-url</docs>
Description:
The <docs> element gives a full URL to help find the documentation for the file. It is a sub-element
of the <business> element and is optional. If present, there may only be one. The initial value is “http://www.smbmeta.org/docs”.
This element is similar to the one in RSS 2.0.
Examples:
<docs>http://www.smbmeta.org/docs</docs>