Most people think of a blog as a diary published online. This is true, but don't assume that blogs are only diary entries by some kid writing about his latest crush, favorite movie and what he did last night.

Blogs - which allow for cross referencing and linking - are also powerful discussion board/forum systems and much more. This project assumes that the use of blogs will continually expand, for us, it's their simple technical structure that really titillates. And their messy knowledge structures, making them ideal for research on different interfaces for extracting useful knowledge.




The history of Blogs is a bit convoluted. According to one of the biggest Blog companies, blogger.com:

"A blog is a web page made up of usually short, frequently updated posts that are arranged chronologically - like a what's new page or a journal. The content and purposes of blogs varies greatly - from links and commentary about other web sites, to news about a company/person/idea, to diaries, photos, poetry, mini-essays, project updates, even fiction. Blog posts are like instant messages to the web. Many blogs are personal, "what's on my mind" type musings. Others are collaborative efforts based on a specific topic or area of mutual interest. Some blogs are for play. Some are for work. Some are both. Blogs are also excellent team/department/company/family communication tools. They help small groups communicate in a way that is simpler and easier to follow than email or discussion forums. Use a private blog on an intranet to allow team members to post related links, files, quotes, or commentary. Set up a family blog where relatives can share personal news. A blog can help keep everyone in the loop, promote cohesiveness and group culture, and provide an informal "voice" of a project or department to outsiders."
www.blogger.com/about.pyra 2003

Author Rebecca Blood comments on her history page:

"The original weblogs were link-driven sites. Each was a mixture in unique proportions of links, commentary, and personal thoughts and essays. Weblogs could only be created by people who already knew how to make a website. A weblog editor had either taught herself to code HTML for fun, or, after working all day creating commercial websites, spent several off-work hours every day surfing the web and posting to her site. These were web enthusiasts."
www.rebeccablood.net/essays/weblog_history.html 2003

The link-driven blog is not what I am concerned with for this thesis. I agree with friends when they say that if what is linked to is important, it should be included right there in the page, not hidden behind a link. Of course, this view does not denigrate blogs which sets out to be guides in some way. Blogrolls (lists of links) will probably be included, but they are not important to the central premise of this thesis, which concerns readability of the main contents.

Rebecca Blood distinguishes between Blogs, Notebooks and Filters in her book "The Weblog Handbook". Blogs "resemble short-form journals. The writers subject is his daily life, with links subordinate to the text." Notebooks are "sometimes personal, sometimes focused on the outside world, notebooks are distinguished from blogs by their longer pieces of focused content." Filters editors "want to show you around the web. Some of these editors strive for pithiness, others for completeness, but even those who use links as a springboard to extended diatribes are focused primarily on the world outside their door".

The web site Marketing terms described a blog as a coherent entity, less than a taxonomy of different classes or styles of easily and frequently comments on the net, in RSS feeds (see chapter on RSS) and the on Web:

"A frequent, chronological publication of personal thoughts and Web links. A blog is often a mixture of what is happening in a person's life and what is happening on the Web, a kind of hybrid diary/guide site, although there are as many unique types of blogs as there are people. People maintained blogs long before the term was coined, but the trend gained momentum with the introduction of automated published systems, most notably Blogger at blogger.com. Thousands of people use services such as Blogger to simplify and accelerate the publishing process. Blogs are alternatively called web logs or weblogs. However, "blog" seems less likely to cause confusion, as "web log" can also mean a server's log files."

As Dave Winer points out, the first web page was a weblog. Of sorts. It contained news of what was going on for Tim Berners-Lee. It's neat to read this page now, it's short, it's geeky, its historical:

What's new in '92

Here's the latest (that we know) about W3. The High-Energy Physics world got its first official announcement of W3 in the CERN computer newsletter released at Christmas, with an introductory article. However, there are already many users of W3 outside HEP!

New browser The new year starts with a release (version 1.1 - our first official "version 1" release) of the line mode browser. This has protocol code in for a wealth of new information, with: Direct access to internet news groups Direct access to " gopher " campus-wide information systems etc. (Gopher is system similar to W3 but using a web of menus and plain text files rather than hypertext. It is all readable as hypertext using W3) Browsing of remote directories using FTP. Before, files could be read - now you can browse around as well! Any FTP site becomes a W3 information source. Follow links directly to telnet (and rlogin) sites. This allows hypertexts to point to online communications facilities which don't have servers. Extensibility using gateways - you can configure www to use specific gateways for any access protocol which might turn up in the future which it can't handle directly. The user interface is slightly improved, and you can save a document to a file or pipe, or print it (under unix). The browser version can be picked up by anonymous FTP in the usual way including source binaries for several platforms. Those who have built other hypertext systems (such as Hyperbole and Viola) on top of the www browser will immediately gain access to the all this newly accessible information. W3 at SLAC Hot on the heels of the announcement of the W3 server for the "SPIRES" High-Energy Physics preprint database at the Stanford Linear ACcelerator lab comes news from Paul Kunz that the line mode browser is installed on all unix systems at SLAC. Happy browsing, folks.

Browsing on VM/CMS The IBM mainframe at CERN now has a copy of the w3 browser (v0.14) running in line mode. We are considering ways to make it more full-screen in the VM style. Browsing under X with viola A version of www running in the "Viola" hypertext system looks good - I just saw it running on an apollo and on a dec station. We hope to release it soon with the coming new version of viola. Conferences The W3 demonstration had an enthusiastic reception at " hypertext'91 " in sans Antonio, Texas.
As I write Jean-Francois is presenting it live at the Software Engineering and AI for High-Energy Physics workshop in La Londe, France. We've also been asked to demonstrate, as well as present a paper, at the Joint European Networking Conference (JENC3) in Innsbruck, Austria, in May. (next issue: April) Tim BL.
Originally on ttp://info.cern.ch/ Archived on w3.org/History/19921103-hypertext/hypertext/WWW/News/9201.html (2003)




In my mind, blogs are a new form of literature. Never before has it been possible to read what other people are writing about their daily lives, their experiences, their views, their thoughts - instantly. I can read about a kid buying a new car in north London as readily as I can read about a young girl getting excited about her upcoming birthday in California (I think it was), while having the chance to send her a 'happy birthday' email on her birthday. We are using the blogging system we are developing to keep a production blog for the Invisible Revolution project (the documentary about Doug Engelbart I am heading up, together with Fleur, invisiblerevolution.net). When Vint Cerf had a read, he emailed me saying it's like reading the making of a movie when and as it's actually happening.

The banal. The insightful. It all fits within the format of a blog. What happens when you put your private thoughts online? Are they even private anymore in the sense that you censor yourself as you write?

Once you realise that there so many people are reading the web log - it's not that you change what you write but you're suddenly aware of the number of people who will react to what you write. So you kind of make sure that people understand what you're saying very clearly. At first it was much more careless, to tell you truth. Now because I know there are lots of people paying attention, I make sure that I'm clear about what I write. I don't want to be misunderstood, which has happened quite often.
Salam Pax, the Baghdad blogger http://news.bbc.co.uk/1/hi/talking_point/3116344.stm

Today we are beginning to notice that the new media are not just mechanical gimmicks for creating worlds of illusion, but new languages with new and unique powers of expression.
McLuhan, M, 1957., in McLuhan E. & Zingrone F. (1997 )

New media may at first appear as mere codes of transmission for older achievement and established patterns of thought. But nobody could make the mistake of supposing that phonetic writing merely made it possible for the Greeks to set down in visual order what they had though and known before writing. In the same way printing made literature possible. It did not merely encode literature.
McLuhan, M, 1960., in McLuhan E. & Zingrone F. (1997 )

It is the framework which changes with each new technology and not just the picture within the frame.
McLuhan, M. 1955, in McLuhan E. & Zingrone F. (1997 )

The spoken word was the first technology by which man was able to let go of his environment in order to grasp it in a new way.
McLuhan in McLuhan E. & Zingrone F. (1997 )

The alphabet was one thing when applied to clay or stone, and quite another when set down on light papyrus.
McLuhan in McLuhan E. & Zingrone F. (1997 )

Broadcast adds as much distance as it takes away. By bringing a rock group closer to millions, the broadcast also creates an emotional distance between the performer and the audience which stays even if the physical distance disappears. In fact it tends to get even more intensified. One of the promises of digital, text based communications is to claw some sort of equilibrium in the projection of discourse, to move us closer; all publishers, all readers, all interactive on much the same level.
What happens with blogs, which are published - and where the audience can add to the blog (with an integrated comment system, which is not the norm, but which the HyperBlog features) - with comments & questions?

The medium, or process, of our time - electric technology is reshaping and restructuring patterns of social interdependence and every aspect of our personal life. It is forcing us to reconsider and re-evaluate practically every thought, every action, and every institution formerly taken for granted. Everything is changing: you, your family, your education, your neighbourhood, your job, your government, your relation to "the others. And they're changing dramatically.
McLuhan, M, 1967, in McLuhan E. & Zingrone F. (1997 )

The changes that tomorrow's computer interfaces are going to cause in the minds of millions of people are good and necessary, considering the fact that we are entering the home stretch in our race against extinction. Personal computers that evolve from contraptions to companions in less than one human life span are part of an overall acceleration of the biosphere's system for becoming conscious enough to take control. The cellular circuit resonates with the neural circuit, the communication circuit, and the whole planet waking up to itself in the nick of time.

predict that if interfaces are designed with the notion of interpersonal communication in mind, the information technologies of the next ten years are going to link amplified individual minds into a global groupmind." "Interactivity is interpersonal..."

The personal computer is becoming the interpersonal computer." "The right kind of interface design can take advantage of the worlds evolving communications web and turn our screens into windows on one another's minds
Leary, T. (1995)

We have become irrevocably involved with, and responsible for, each other.
McLuhan, M., 1967., in McLuhan E. & Zingrone F. (1997 )




RSS is the technical protocol for how blogs are distributed.

The O'Reilly "Content Syndication with RSS" by Ben Hammersley (2003) discusses RSS as an XML content-syndication standard. It is basically an XML dialect. For the purpose of the HyperBlog project RSS is only seen as a form of an 'export' of the blog data. I have not spent much time on the development of our RSS export and since the HyperBlog project is only a prototype, I have concerned the development with the web/HTML interface.

RSS has become the de-facto technical export for blogs though, and in order to participate in the technically distributed blogging community, or 'blogosphere' through aggregators and RSS 'news' readers, we do provide RSS feeds.

From XML.COM, 'What is RSS?' by Mark Pilgrim (2002):

RSS is a format for syndicating news and the content of news-like sites, including major news sites like Wired, news-oriented community sites like Slashdot, and personal weblogs. But it's not just for news. Pretty much anything that can be broken down into discrete items can be syndicated via RSS: the "recent changes" page of a wiki, a changelog of CVS check ins, even the revision history of a book. Once information about each item is in RSS format, an RSS-aware program can check the feed for changes and react to the changes in an appropriate way. RSS-aware programs called news aggregators are popular in the weblogging community. Many weblogs make content available in RSS. A news aggregator can help you keep up with all your favourite weblogs by checking their RSS feeds and displaying new items from each of them.
www.xml.com/pub/a/2002/12/18/dive-into-xml.html (2003)

He goes on:

But coders beware. The name "RSS" is an umbrella term for a format that spans several different versions of at least two different (but parallel) formats. The original RSS, version 0.90, was designed by Netscape as a format for building portals of headlines to mainstream news sites. It was deemed overly complex for its goals; a simpler version, 0.91, was proposed and subsequently dropped when Netscape lost interest in the portal-making business. But 0.91 was picked up by another vendor, UserLand Software, which intended to use it as the basis of its weblogging products and other web-based writing software. In the meantime, a third, non-commercial group split off and designed a new format based on what they perceived as the original guiding principles of RSS 0.90 (before it got simplified into 0.91). This format, which is based on RDF, is called RSS 1.0. But UserLand was not involved in designing this new format, and, as an advocate of simplifying 0.90, it was not happy when RSS 1.0 was announced. Instead of accepting RSS 1.0, UserLand continued to evolve the 0.9x branch, through versions 0.92, 0.93, 0.94, and finally 2.0.
www.xml.com/pub/a/2002/12/18/dive-into-xml.html (2003)




The writing of journals have been around for almost as long as there has been writing.


The writing of journals has a long tradition, from the ponderous musings of so many modern blogs, back in time to: "o the heaven and earth then contain Thee, since Thou fillest them? or dost Thou fill them and yet overflow, since they do not contain Thee? And whither, when the heaven and the earth are filled, pourest Thou forth the remainder of Thyself?" (http://ccat.sas.upenn.edu/jod/augustine/Pusey/book01 2003) which was written by Augustine about 1500 years ago.

Then there is Geoffrey Chaucer's Canterbury Tales; "Of England they to Canterbury wend, The holy blessed martyr there to seek. Who helped them when they lay so ill and weal. Befell that, in that season, on a day. In Southwark, at the Tabard, as I lay. Ready to start upon my pilgrimage. To Canterbury, full of devout homage." (www.canterburytales.org/canterbury_tales.html 2003)

autobiographical cartoons

Harvey Pekar's 'American Splendour' is a self published, autobiographical comic. For sixteen issues, from 1976 to 1991, Harvey Pekar "was documenting the trials, tribulations and trivia of being a filing clerk and part-time journalist in Cleveland." (Plowright, F. 2003)
This is of interest to the HyperBlog as it is completely illustrated, whereas the current version of the HyperBlog is text only with the ability to add thumbnail images. Another issue which comes screaming out from the movie version of 'American Splendour' is the question of who would be interested in reading about 'true, daily life' where by definition nothing happens? Harvey Pekar say's in the movie trailer: "Ordinary life's pretty complex stuff". (americansplendormovie.com) So does it work? Roger Ebert, of the Chicago Sun-Times gives the movie 4 stars (suntimes.com/output/ebert1/wkp-news-american22f.html 2003)


But then the web came along and the notion of a 'home-page' which has never really been properly defined. Is it the user's page, with links and such to go places? Is it the main page of a site? Only little gnomes inside the net know, and they're too busy writing 404 error messages. Netscape however, took it to mean a little of both. That only happens after a slightly convoluted route of meta-information, and dreams of a more dynamic web. Ben Hammersley starts the story in "Content Syndication with RSS":

The deepest, darkest origins of the current versions of RSS began in 1995 with the work of Ramanathan V. Guha. Known to most simply by his surname, Guha developed a system called the Meta Content Framework (MCF). Rooted in the work of knowledge-representation systems such as CycL, KRL and KIF, MCF's aim was to describe objects, their attributes, and the relationships between them.
MCF was an experimental research project funded by Apple, so it was pleasing for management that a great application came out of it: ProjectX, later renamed HotSauce. By late 1996, a few-hundred sites were creating MCF files that described themselves, and HotSauce allowed users to browse around these MCF representations in 3D.

It was popular, but experimental, and when Steve Jobs' return to Apple's management in 1997 heralded the end of much of Apple's research activity, Guha left for Netscape.

There he met Tim Bray, one of the original XML pioneers, and started moving MCF over to an XML-based format/ (XML itself was new at the time.) This project later became the Resource Description Framework (RDF). RDF is, as the World Wide Web Consortium (W3C) RDF Primer says, " a general-purpose language for representing information on the World Wide Web." It is specifically designed for the representation of metadata and the relationships between things. In its fullest form, it is the basis for the concept known as the Semantic Web, the W3C's version of a web of information that computers can understand.

This was in 1997, remember. XML, as a standard way to create data formats, was still in its infancy, and much of the Internet's attention was taken up by the increasingly frantic war between Microsoft and Netscape.

Microsoft had not ignored the HotSauce experience. With others, principally a company called Pointcast, they further developed MCF for the description of web sites and created the Channel Definition Format (CDF).

CDF is XML-based and can describe content ratings, scheduling, logos, and metadata about a site. It was introduced in Microsoft's Internet Explorer 4.0 and later into the Windows desktop itself, where it provided the backbone for what was then called Active Desktop.

By 1999, MCF was well and truly steeped in XML and becoming RDF, and the Microsoft/netscape bickering was about to start again. Both companies were due to launch new versions of their browsers, and Netscape was being circled for a possible take-over by AOL.
So, Netscape's move was to launch a portal service, called "My Netscape Network", and with it RSS.

Short for RDF Site Summary, RSS allowed the portal to display headlines and URLs from other sites, all within the same page. A user could personalise their My Netscape page to contain the headlines from any site that interested them and had an RSS file available. It was basically, a web page-based version of everything HotSauce and CDF had become. It was a great success.
Hammersley , B. (2003)

I have included a post by Dan Libby (who represents the next step in the history, finally, at Netscape) on the early history to illustrate how new this all is, how it's changed and how we still have an exciting and open future ahead of us. It also has a few cautionary design tales:

The original My Netscape Network Vision:
We would create a platform and an RDF vocabulary for syndicating metadata about websites and aggregating them on My Netscape and ultimately in the web browser. Because we only retrieved metadata, the website authors would still receive user's click-throughs to view the full site, thus benefiting both the aggregator and the publisher. My Netscape would run an RDF database that stored all the content. Preferences akin to mail filters, would allow the user to filter only the data in which they are interested onto the page, from the entire pool of data."..." Tools would be made available to simplify the process of creating these files, and to validate them, and life would be good.

What Actually Happened:
1) A decision was made that for the first implementation, we did not actually need a "real" RDF database, which did not even really exist at the time. Instead we could put the data in our existing store, and instead display data, one "channel" at a time. This made publishers happier anyway, because they would get their own window and logo. We could always do the "full" implementation later.
2) The original RDF/RSS spec was deemed "too complex" for the 'average user'.
3) We shipped the first implementation, sans tools. Basically, there was a spec for RSS 0.9, some samples, and a web-based validation tool. No further support was given for a while, and I was kept busy working on other projects.
4) At some point, it was decided that we needed to rev the RSS spec to allow things like per item descriptions, i18n support, ratings, and image widths and height. Due to artificial (in my view) time constraints, it was again decided to continue with the current storage solution, and I realised that we were *never* going to get around to the rest of the project as originally conceived. At the time, the primary users of RSS (Dave Winer the most vocal among them) were asking why it needed to be so complex and why it didn't have support for various features, e.g. update frequencies. We really had no good answer, given that we weren't using RDF for any useful purpose. Further, because RDF can be expressed in XML in multiple ways, I was uncomfortable publishing a DTD for RSS 0.9, since the DTD would claim that technically valid RDF/RSS data conforming to the RDF graph model was not valid RSS. Anyway, it didn't feel "clean". The compromise was to produce RSS 0.91, which could be validated with any validating XML parser, and which incorporated much of userland's vocabulary, thus removing most (I think) of Dave's major objections. I felt slightly bad about this, but given actual usage at the time, I felt it better suited the needs of its users: simplicity, correctness, and a larger vocabulary, without RDF baggage.
5) We shipped the thing in a very short time, meeting the time constraints, then spent a month or two fixing it all. :-) It was apparently not deemed "strategic", and thus was never given more than maintenance attention.
6) People on the net began creating all sorts of tools on their own, and publishing how-to articles, and all sorts of things, and using it in ways not envisioned by, err, some. And now we are here, debating it all over again. Fortunately, this time it is in an open forum.
groups.yahoo.com/group/syndication/message/586 (2003)

Simplicity won, richness lost.

Dave Winer who runs the Scripting News weblog and who is a central, vocal proponent of simple RSS comments:

Weblogs are often-updated sites that point to articles elsewhere on the web, often with comments, and to on-site articles. A weblog is kind of a continual tour, with a human guide who you get to know. There are many guides to choose from, each develops an audience, and there's also camaraderie and politics between the people who run weblogs, they point to each other, in all kinds of structures, graphs, loops, etc. Today, there are hundreds of thousands of weblog sites, and the market for tools for managing such sites is growing quickly.

My company, UserLand, makes two products for weblogs, Manila, which is a centralised server-based content management system; and Radio UserLand which provides easy and powerful weblogging from the desktop.

The first weblog was the first website, http://info.cern.ch/, the site built by Tim Berners-Lee at CERN. From this page TBL pointed to all the new sites as they came online. Luckily, the content of this site has been archived at the World Wide Web Consortium. (Thanks to Karl Dubost for the link.) NCSA's What's New page took the cursor for a while, then Netscape's What's New page was the big blog in the sky in 1993-96.

Then all hell broke loose. The Web exploded, and the weblog idea grew along with it. I did my first weblog in February 1996, as part of the 24 Hours of Democracy website. It helped glue the community together, along with a mail list that was hosted by AOL.

In April 1996 I started a news page for Frontier users, which became Scripting News on 4/1/97. Other early weblogs include Robot Wisdom, Tomalak's Realm and CamWorld.
newhome.weblogs.com/historyOfWeblogs 2003

But isn't it ironic, all this work has gone on and all I care about is that fact that the current RSS standards (all of them) are pretty simple. It's a joy to be able to work on creating good systems and good software for something which can be useful for many, without a huge technical overhead to even begin to be able to add value, as you'd have to have to be able to deal with picture files or video files, 3D or even Microsoft Word files. Even email is not simple and pure any more. POP and SMTP, the two main protocols for sending and receiving email, have been subverted by AOL & Microsoft (Hotmail), whose services are only accessible through their own, secret protocols.


