One of the major drivers of the US economy are small and medium businesses (which we'll call "SMBs"). These range from
restaurants, machine shops, lumber yards, advertising agencies and law offices, to carpenters, musicians, and locksmiths,
to weekend DJs and grandmothers selling their knitting. This document describes a data file format and associated services
designed to help those businesses in their use of the Internet.
One of the concerns of businesses is having their web site found by customers. One of the concerns of customers is being
able to find an appropriate set of businesses from which to choose to meet their needs. Web sites and normal search engines
meet some of these needs. Unfortunately, it has been difficult for search engines to ascertain specific information such as
the particular locale served by a business, the type of the business, the languages spoken by the staff, etc. A human being
can often find out this and a wide variety of other information by reading a web site, but it can be hard to automatically
find it out for constructing a reliable database. The goal of this "SMBmeta" project is to provide a way to amass this additional
data to aid in searching. It is not to provide the data that you would find on the web site itself, just the data you use
in searching.
The way we do this is with an "smbmeta.xml" file.
The smbmeta.xml file is an XML file stored at the top level of a domain that contains machine readable information about
the business the web site is connected to. It is an open, distributed way for small and medium businesses to communicate
information such as the physical location of the business and the area it serves, as well at the type of business, to search
engines and other services. Hopefully, it will open up innovation that will result in a wide variety of new services that
will benefit the SMBs and their customers.
In designing SMBmeta we learned a lot from watching the experience the weblogging community had with
RSS files and other non-text editing facilities. RSS files are companion files to weblogs that usually reside on the same server as
the weblog. While weblogs are written in HTML and designed to be read by humans, RSS files are written in XML and are meant
to be read by computer programs. They contain information that you'd want to use in an automated way. They contain a list
of the most recent postings to the weblog, as well as the date/time that the weblog was last changed. Programs on web sites
other than that hosting the RSS file read the RSS file to provide an up to date list of recent postings, along with a description
of the weblog. The posts of more than one weblog could then be listed together using appropriate "aggregation" software. Many
web sites started adding a "recent posts on weblogs I care about" section. It became much easier for people to keep up with
multiple weblogs at a time.
Another example of a static file, with an open format, useful to both web site authors and external services, is the
"robots.txt" file that has been around for many years. This optional file, stored at the top level of a domain's web site,
is in a very simple, human-writable format and is read by "spider" programs like search engines to get the domain owner's
hints as to which parts of the web site to traverse.
In addition to the RSS files for aggregating content, another facility grew up around weblogs. Services like "
weblogs.com" from Userland and "
blo.gs" let any weblog register with the service and tell it every time the weblog had a change. The service then aggregated this
information, and provided "last changed" data in a variety of forms to users. With hundreds of weblogs on your "I like these"
list and sporadic posting behavior running from multiple times a day to once every couple of months, these change service
aggregators became very popular in the weblogging community. Many weblogging tools, like Radio, Blogger, and Movable Type,
have options to let you automatically notify your preferred services each time you make a new post.
Machine readable files like RSS, and shared, open-to-all event-aggregation services like weblogs.com have been a boon
to the weblogging community. What can we learn from them?
First: We learn that simple, easily created XML data files hosted along with human readable content are a valuable thing.
If simple and useful enough, even authoring products maintained on a shoestring can support them. Second: By having human
accessible ways of participating, such as with hand-written XML or browser accessed, forms-based event signaling, authors
can participate even before the tools are updated. Third: By only requiring simple, static files on a server, and no fees
to participate, the barriers to participating are low. Fourth: By using open, published standards, and open access to the
data, innovation can occur anywhere. Finally: Having initial, useful services that can be tried by all jumpstarts the acceptance
and innovation.
Another lesson learned from the Internet in general is that while the cost to participate should be minimal, it should
not be zero, because facilities that can be flooded for free will be. That's why we insist that the file be located at the
top of the domain: because domains cost money. The money to pay for a domain name is money a business would spend anyway --
the marginal cost to a legitimate business is in fact zero -- but nonetheless it's a real cost that can help prevent spam.
Here are some key thoughts that are driving the design of SMBmeta:
Keep it simple. Time and again on the Internet, the simple protocol beats out the "full-fledged" complex
one. SMTP beat out X.400, HTTP and HTML beat out many complex formats. This is especially the case with low budget web sites.
Have the data reside on the author's web site. By hosting the data along with the web site, there is
no central repository on which to depend. This can scale and evolve better. As new services using the data are invented, they
can always go back to the source without "permission" from existing services. If a service is unsuccessful, its demise does
not destroy the original data.
Keep it open. The formats and protocols being proposed are usable by all. You don't have to depend upon
any given service for success. The reasons to participate can be predicated upon altruistic or general greedy feelings, not
just the belief in a single service or tool provider.
Provide immediate benefit. The format proposal is being coupled with some services that take advantage
of it.
Keep the cost to participate minimal.
Think small business, not big business. The data is tailored to a small business with few locations.
The full complexity needed by huge enterprises need not be accommodated.
There will be other machine readable information files. There is no need to solve all of the web's problems
with this one system. Other solutions can be completely independent and just reside in their own file. An escape hatch, which
links to other files or to particular web pages, can be used for some additions, too.
People will try to "game" this system. Make the system tolerant of businesses trying to use it to their
maximum advantage. One person's "heavy handed marketing" is another's "just what I was looking for". Make it as easy as possible
for users to detect gaming if they care. Make use of techniques to have people take responsibility for their data by keeping
it connected to domains. Search results can give hints, such as that a business claiming to be a "neighborhood" one has listed
10,000 different locations as its "neighborhood".
Remember that the end result is for a customer to find a web site. Not all data needs to be in this
file, just data needed for queries that is not available by other means.
Try to avoid data that makes "spamming" the web site owners easier. Don’t include email addresses,
phone numbers, or other hard to harvest information.
To help this get started, Interland (and its Trellix subsidiary) are doing the following:
1) Publishing a draft specification with samples.
2) Creating and hosting a simple free online tool for creating smbmeta.xml files. This tool can be used to create files
for inclusion on any web site by the web site owner.
3) Creating and hosting a simple free aggregation system to help web site owners publicize that they have smbmeta.xml
files and to help innovators trying to use this facility find those web sites without needing to repeatedly spider the whole
web.
4) Encouraging other major players on the web to participate in some way with this effort.
5) Notifying its many customers about this program.
6) Publicize this effort, and maintain online weblog and technical information about it to jumpstart the participation
of others.
7) Provide the specification and sample program code in an open means through an appropriate Creative Commons or other
relatively open license.