From timbl  Mon Oct 28 14:34:12 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA06863; Mon, 28 Oct 91 14:34:12 GMT+0100
Date: Mon, 28 Oct 91 14:34:12 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9110281334.AA06863@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: www-talk
Subject: test again!

If you get this, delete it. - Sorry!

From timbl  Mon Oct 28 16:33:14 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA06989; Mon, 28 Oct 91 16:33:14 GMT+0100
Date: Mon, 28 Oct 91 16:33:14 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9110281533.AA06989@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: www-interest@nxoc01.cern.ch
Subject: WorldWideWeb mailing list:  Introduction

We have (at last!) started the www-interest mailing list. Your name  
is, for one reason or another, on it. The list is a list for  
announcements about the World Wide Web (W3) distributed information  
system, mainly about

	   o	New online information available

	   o	New W3 software releases

If you do not want to be on this list, please accept our apologies  
and mail listserv@info.cern.ch with the message body 

	
	delete www-interest

If others wish to subscribe to this list, they should mail  
listserv@info.cern.ch with the message body

	add www-interest

There is a similar list, called www-talk, for developers of W3  
software. Members of www-talk get www-interest automatically.

If you have any queries for a human response, mail  
www-interest-request@info.cern.ch.

	Tim BL
__________________________________________________________
Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web project                (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155





From steved%bullwinkle@relay.eu.net  Mon Oct 28 18:51:28 1991
Return-Path: <steved%bullwinkle@relay.eu.net>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA07169; Mon, 28 Oct 91 18:51:28 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA02040; Mon, 28 Oct 91 18:46:02 +0100
Received: from wupost.wustl.edu by mcsun.EU.net with SMTP;
	id AA09434 (5.65a/CWI-2.120); Mon, 28 Oct 1991 18:48:29 +0100
Received: by wupost.wustl.edu (5.65b/WUSTL-0.3) with UUCP 
	id AA20884; Mon, 28 Oct 91 11:45:35 -0600
Received: by bullwinkle.UUCP (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA02000; Mon, 28 Oct 91 11:09:48 CST
Date: Mon, 28 Oct 91 11:09:48 CST
From: steved%bullwinkle@relay.eu.net (Steve Dieringer)
Message-Id: <9110281709.AA02000@bullwinkle.UUCP>
To: www-talk@nxoc01.cern.ch
Subject: add www-talk

add www-talk

Please note that my return address should be:
steved%bullwinkle.uucp@wupost.wustl.edu


From timbl  Tue Oct 29 10:03:11 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA07413; Tue, 29 Oct 91 10:03:11 GMT+0100
Date: Tue, 29 Oct 91 10:03:11 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9110290903.AA07413@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: connolly@pixel.convex.com, www-talk
Subject: Re: status. Re: X11 BROWSER for WWW 

Dan,

> I've made some tangible progress on the X11 browser, so I though 

> I'd let you know.
> ...
> This code is not in any shape to distribute, or even show anybody.
> But it works, and it's pretty speedy. That's enough to encourage me  
> to polish it off.

Sounds like great progress! The TCL sounds interesting -- where did  
you get it? 


> [If you wan't my stuff, you'll have to be C++ capable. I can't
> think in C any more. :-]

Don't worry - we can handle C++, although for the line mode browser  
we wanted portability into places where C++ could not reach. That's  
why the common code (in WWW/Implementation) is all in C. Believe me,  
after writing the NeXT browser in Objective-C it was a wrench to  
conclude that it would have to be deobjectified.

> If you could round up some info on exactly what I can expect to see  
> in an HTML file, and some idea of how you want it formatted [I have  
> the HTML doc and the LineMode browser, but if you've got time to
> give me a little more info...] I'll be ready to tackle that pretty  
> soon.

You ask for info on exactly what you can expect to find in an HTML  
file, but you've read the two HTML files about HTML.  What is missing  
from there?

Here is some discussion about the tags -- where it's not in  
http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html I have updated  
that document now.

Most of the tags are just style tags: this goes for the headings H1  
to H6, the lists UL and OL with list elements LI, the glossary DL  
with elements DT and DD.

<TITLE> ..<TITLE> is designed to be used for putting in the top  
banner of a window, or using as the window  name. It also is what you  
would use in a history list. It shouldn't be displayed in the text  
itself, as usually there is a <H1> heading atteh top of the text  
anyway. A difference is that thet title is designed to make sense out  
of context, whereas the heading is within context. For example,
a title might be "Formatting Characters for Printf -- C reference  
manual" whereas the heading may just be "Formatting characters".

The base address tag is not used, nor is highlighting HP1 etc.

Anchors are used!  The REL attribute is NOT used.

<ISINDEX> is sent by servers to indicate that they will accept a  
search given this document name plus keywords. It turns on a search  
panel when the document is the main window.  An even better  
implementation would have a keyword field at the bottom of the text  
window if the document is a searchable index.  That would make the  
document more self-contained as an item in the user's eyes, and  
reduce screen clutter.

<NEXTID> can be ignored by browsers, only needed for editors.

<XMP> and <LISTING> are used to indicate inserted literal text.
To make life easier for those writing documents (and because we don't  
have entities in the code yet) they are special in that EVERYTHING is  
litteral text until the closing tag - so one can use XMP for giving
examples of HTML for example.  (We really need an escaping method -  
the next parser will have simpl entities like "&lt." for "<".)
Within XMP or LISTING, newlines are significant (and mean "new  
line"!)

<PLAINTEXT> is used to indicate that the rest of the file is in fact
just ASCII. It turns off SGML parsing completely. It's a fudge for
the moment, until we have the document format negociation.
______________________________________

	Structure of documents:

In writing a new generic parser, I wondered whether your text object  
will store the nested structure of a document. At the moment, the  
document is a linear sequence of styles: you can't have lists within  
lists, etc. Ideally, it would be able to handle this - although its  
more difficult for a human writer to handle when formatting the  
document. I would in fact prefer, instead of <H1>, <H2> etc for  
headings [those come from the AAP DTD] to have a nestable  
<SECTION>..</SECTION> element, and a generic <H>..</H> which at any  
level within the sections would produce the required level of  
heading.

For a browser, it is quite satisfactory to flatten the structure back  
into a sequence of styles, but for an editor it isn't. Are you going  
to go for editing capability?

Tim

PS: Shall I put you on the www-talk list?

From timbl  Wed Oct 30 15:33:16 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA08339; Wed, 30 Oct 91 15:33:16 GMT+0100
Date: Wed, 30 Oct 91 15:33:16 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9110301433.AA08339@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: www-interest
Subject: Telnet access to W3 information server


			TELNET ACCESS to W3

You can new telnet to our information server. 


		Telnet to:      info.cern.ch
		User name:      www
		(no password)

You will be presented with the home page which is used at CERN on the
central machines. From there, you can follow links whatever documents  
and indexes we know about at CERN or elsewhere in the world of online  
information. You will be using the line mode brower, which assumes  
nothing about your terminal capabilities.

This trial service is provided for those who want to try out the  
software, or who need information and are away from home. If you use  
this service frequently, it is much more efficient and faster for you  
to install the browser locally.

You can of course get help, including installation instructions, by  
following the "Help" link from the home page.

__________________________________________________________
Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web project                (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155





From timbl  Thu Oct 31 11:49:15 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA09130; Thu, 31 Oct 91 11:49:15 GMT+0100
Date: Thu, 31 Oct 91 11:49:15 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9110311049.AA09130@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Edward Vielmetti <emv@ox.com>
Subject: Re: Home page design, WAIS gateway bug, MSEN
Cc: www-talk

Ed, you asked:

>> Any chance that you could put the "home page" from CERN and some   
other sample good pages up for anonymous FTP? <<

Done for the CERN home page:  
//info.cern.ch/pub/WWWLineModeDefaults.tar.Z

> I'd be interested to hear any thoughts you have on what it takes to
> make a good home page.  I suppose you want to be sure that a user
> doesn't get so completely lost that they can't find their way out,  
> enough local information that people feel more or less at home.
> hm hm hm.

Yes, Good home page design is an art -- like the cover of a magazine,  
or a quick-reference card. Of course it depends on the readership.  
The CERN home page has to start with the CERN things to minimise the  
number of keystokes/clicks for the largest number of users. At the  
same time, it needs pointers for someone with a broader interest to  
rapidly find a wider topic, and it has to suggest to people what is  
behind it so that later they will use it again on another topic.
The competition for the first 24 lines is hot! I have thought of  
having a "Latest additions" link, so that people who though they know  
the web can check for new bits.

There is also the question of whether to make the layout really open  
(lots of white space), with 5 well-explained links on each page, or  
to cram in as much as possible. I feel one should start with  
something very open and obvious, but then get more compact once the  
reader is into something he is interested in and has got the hang of  
the program. Having a fast scollbar make it much easier to cope with  
lots of open text. People must have done their PhDs on this sort of  
thing...

I suspect one should have, for each site/organisation, a public home  
page for those from outside, as well as a private one for those who  
will underderstand terms differently. For example, a link to the CERN  
phone book from outside could mention that the numbers need to be  
prefixed with +41(22)767!

Also, both pages should be linked to some list of other sites.  
Perhaps a tree of pages which emulate the domain/x500 naming scheme a  
little would be useful because people are used to browsing that way,  
and will be able to once x500 is part of the web. This is only one  
structure which is useful, though. A tree by subject a la Dewey  
decimal system would be another - hypertext would get over the tree  
restriction which limits Dewey's usefulness. In fact, making  
hypertext overviews and making indexes of third party data should be  
"value added services" which anyone - library, or company like  
yourselves, should be able to do on top of existing data. Making  
sense of the morass of data (as you have been doing for years) is a  
very valuable contribution to the world of knowledge. Such ordered  
overview or review information is likely to be much more widely read  
than the underlying documents.  The best reviews will be most quoted,  
and hence most read, so survival of the fittest will ensure that most  
people don't spend their time reading junk.

> By the way, it's possible to build a Sun 3 client with no problem
> at all - just make a "sun3" directory, copy in the Sun 4 makefile, 

> and make.

Thanks - I don't have sun3 to test on, but I'll make the directory
any copy the makefile: thanks!


> "Document address invalid or access not authorised" on 

 'http://info.cern.ch./hypertext/Products/WAIS/NewsGroupRelated.html'
> could you check on it?)

Oops .. [Long story: The default home page in the last release has a  
pointer to a file ...Products/WAIS/Sources.html which had just been  
renamed  ...Products/WAIS/Sources/Overview.html. When you read it,  
there was a soft link from the old to the new so you read the new  
file but with the client thinking it was at the old address. This  
worked until I put in the new relative link to your list. Then, the  
relative link was parsed relative to the old address, generating the  
bad adderss above]. It should be ok now. 


___________________________________________________________________

>> I figured out how to store a WAIS query.  For your "What is MSEN"  
pointer,  something like
<A  
HREF=http://info.cern.ch:8001/quake.think.com:210/wais-discussion-arc 
hives?msen> will work just fine. <<

Well done! .. I've linked "MSEN" in your list to you main document.

By the way, I want to make that address

	wais:/quake.think.com:210/wais-discussion-archives?msen

but first I have to put into the client a table of gateway addresses  
for protocols the client doesn't know himself.

>> I hacked the line mode client so that "RECALL" and "LIST" spit  
things out in a format that's ready to cut and paste into a source  
document; that was the easiest way to get documents of my own going  
quickly. <<

Ok...I wondered whether a command "Append a reference to this node in  
HTML to file xxx" would be useful. It would allow people to keep  
lists of interesting nodes in their own space. It's in the Line Mode  
bug list now.

________________________________

>> WAIS database names can include / in them, which
gums up your heuristics for figuring out how to parse them. <<


Yes -- that's true.  I should escape them or something...

Thanks for all your feedback, Ed.  MSEN sound like something heading  
in the right direction.  By the way, do your [prospective] clients  
have workstations in general, or is it all MDOS? Do they dial in, or  
have leased lines?

I wish I could have gone to the IETF to meet a few people in person,  
yourself included, but Robert Cauilliau and I are going to HyperText  
91 (Dec 15-18 in San Antonio TX), and that blows my US Travel quota.  
Will you be at HT91 by any chance?

Tim BL



From jkp@sauna.cs.hut.fi  Sun Nov  3 08:19:02 1991
Return-Path: <jkp@sauna.cs.hut.fi>
Received: from cernvax.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA00811; Sun, 3 Nov 91 08:19:02 GMT+0100
Received: by cernvax.cern.ch (5.57/Ultrix2.0-B)
	id AA26085; Sun, 3 Nov 91 08:15:29 +0100
Received: by sauna.cs.hut.fi id AA25310
  (5.65c8/HUTCS-C-91-07 for www-talk@info.cern.ch); Sun, 3 Nov 1991 09:15:13 +0200
Date: Sun, 3 Nov 1991 09:15:13 +0200
Message-Id: <199111030715.AA25310@sauna.cs.hut.fi>
From: Jyrki Kuoppala <jkp@cs.hut.fi>
Sender: jkp@sauna.cs.hut.fi
To: www-talk@nxoc01.cern.ch
Subject: www server at tky.hut.fi
Return-Receipt-To: jkp@cs.hut.fi
Organization: Helsinki University of Technology, Finland.

OK, now there is quite a lot of stuff added into the www server at
otax.tky.hut.fi - though it's not a web, but a tree.  The html files
for directories are automatically created by a simple shell script
from the normal Unix-style directory tree, and the text files itselves
are normal ascii to which links are created in the html files.

There's one problem: the otax files have tab characters in them and
the server or the line mode client seems to mostly convert them to
line breaks.

//Jyrki

From timbl  Fri Nov  8 10:00:33 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA04556; Fri, 8 Nov 91 10:00:33 GMT+0100
Date: Fri, 8 Nov 91 10:00:33 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9111080900.AA04556@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: www-talk
Subject: WWW and prospero



Begin forwarded message:

To: rusty@mail.cornell.edu
Cc: tbl@cernvax.cern.ch, bcn@june.cs.washington.edu
Subject: WWW and prospero
Date: Thu, 07 Nov 91 19:36:05 -0500
From: Edward Vielmetti <emv@ox.com>
X-Mts: smtp


   Does anyone know about WWW (World Wide Web) and Prospero, and how I can find
   out more information about them? Thanks!

Prospero is a remote file system.  If you have an archie client (like
the archie clients at ftp.cs.widener.edu:/pub/archie/) you'll be
using the Prospero protocol to send queries to the servers.  You can
pick up the server at june.cs.washington.edu:/pub/prospero.tar.Z, and
look at the documentation in june.cs.washington.edu:/pub/pfs/doc/.

WWW is an interesting hypertext system from CERN.  You can try it out
by telnetting to info.cern.ch (login: www) or by ftp'ing the clients
or servers from that site.  What's particularly neat is that you can
embed references in a WWW document which point to WAIS servers (as
well as to other WWW documents or files on anonymous FTP) - that makes
it quite straightforward to build on-line systems with a mix of
structured menus and searching stuff.

I'd compare WWW with "gopher" from U of Minnesota (see
boombox.micro.umn.edu:/pub/gopher/); both of them would be suitable
for building a campus-wide information system with.  WWW is much more
of a web with links sending you off hither and yon; selections on
gopher menus can set you talking to servers a long distance away, but
it seems from what I've looked at to be much more of a hierarchical
approach.  WWW also lets users design their own menus.  You can test
out gopher by telnetting to consultant.micro.umn.edu, login gopher.

Both WWW and gopher offer snappy user interfaces for NeXT machines.

   Aaron (Rusty) Lloyd			Cornell Information Technologies
   rusty@mail.cornell.edu			SCAdian, and proud of it!

As an aside, it would be really nice if WWW could be taught the
Prospero protocol like it knows WAIS....

-- 

Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com
       MSEN, Inc. 628 Brooks Ann Arbor MI 48103 +1 313 741 1120



From timbl  Fri Nov  8 11:17:05 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA04649; Fri, 8 Nov 91 11:17:05 GMT+0100
Date: Fri, 8 Nov 91 11:17:05 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9111081017.AA04649@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: www-interest@nxoc01.cern.ch
Subject: WWW-WAIS Gateway

(Now its been running for some time, I guess I should announce it!) 


		World-Wide Web <-> WAIS Gateway Running

A gateway running on info.cern.ch provides access by any WWW browser  
to the world of information provided by "WAIS" servers. WAIS servers  
are full-text search servers using software from Thinking Machines  
Corporation. There's more infomation about WAIS and the gateway in  
the web.

[By the way, if you have an old WWW default page which may not have  
links to everything of interest, you can pick up by ftp (or link to)  
a new one from file://info.cern.ch/pub/default.html]

HYPERTEXT GUIDE

You can find WAIS indexes by browsing a hypertext guide to WAIS  
(linked from our default page), and/or doing an index search on the  
WAIS index of indexes. 

The guide starts at  
http://info.cern.ch/hypertext/Products/WAIS/Sources/Overview.html
Here is an sample of what there is: 


Biochemistry	 The EC enzyme database of Amos Bairoch , REBASE  
		restriction enzymes , the annotation of the 	 
		GenBank(R) DNA sequence database (Bacterial 

		Division), the Peter Karps CompoundKB database of 981  
		metabolic intermediate compounds ,  periodical 

		references to journals in the area of molecular 

		biology , BIOSCI mailing lists and newsgroup archives

Geography	 Asia Pacific region: Curriculum Resources & Course 

		outlines;  India: Miscellaneous information 


Humanities	 Discussion, Poetry 


Meterology	 The weather (around MIT)

Music		 MIDI interfacing ,  Song lyrics ,

Religion	 The Bible (King James version) , The Holy Qur'an 




Computing & Networking:


	AARNet	 Australian Academic and Research Network Resources
		 Guide

	Fidonet	 List of nodes

	Usenet	 FAQ, cookbook, science

	Internet RFCs, resource guide, etc etc

	(etc etc)


			By Organsiation

 E.F.F.	 	Electronic Frontier Foundation: Documents, 

		discussion

 N.S.F.	 	National Science Foundation: bulletins

 M.I.T.	 	Algorithms book: Bugs , excercises , suggestions  for 

		the book, 'Introduction to Algorithms' by Tom Cormen, 

		Charles Leiserson, and Ron Rivest, all members of 

		Theory of Computation Group, Laboratory for Computer 

		Science.  Weather .

 University of Noth Carolina	 Phone book 


 University of North Texas	 Documents 


 Univ. Oslo	 		Publications bibliography 



Mail me with any problems/questions/suggestions.
__________________________________________________________
Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web project                (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155





From jfg@bernd.cern.ch  Fri Nov  8 12:20:51 1991
Return-Path: <jfg@bernd.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA04713; Fri, 8 Nov 91 12:20:51 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA18954; Fri, 8 Nov 91 12:13:07 +0100
Received: by bernd.cern.ch (AIX 3.1/UCB 5.61/4.03)
          id AA11283; Sat, 9 Nov 91 12:13:19 +0100
Date: Sat, 9 Nov 91 12:13:19 +0100
From: jfg@bernd.cern.ch (Jean-Francois Groff)
Message-Id: <9111091113.AA11283@bernd.cern.ch>
To: Edward Vielmetti <emv@ox.com>
Cc: bcn@june.cs.washington.edu, www-talk@nxoc01.cern.ch,
        rusty@mail.cornell.edu
Subject: Re: WWW and prospero
References: <9111080900.AA04556@ nxoc01.cern.ch >

> As an aside, it would be really nice if WWW could be taught the
> Prospero protocol like it knows WAIS....

This has a high priority in our wish-list, but we are very busy
preparing a new kernel for WWW (you can call it WOW if you can't
pronounce that !), which features multiple TYPED links from/to each
anchor, so you can write knowbots for instance, and fast (we hope)
format negotiation between clients and servers.

The availability of this will be announced on the www-interest mailing
list. To register on that, send a mail to listserv@info.cern.ch with
body text "add www-interest". Your mail address will be extracted from
the From: field.

Sorry, no firm dates yet, but expect it by the end of the year.

----
	Jean-Francois Groff (jfg@cernvax.cern.ch)
	World-Wide Web project
	CERN, ECP division
	CH-1211 Geneva 23, Switzerland
	Phone: +41 22 767 3755
	Fax:   +41 22 767 7155
----

From timbl  Fri Nov  8 13:35:26 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA04757; Fri, 8 Nov 91 13:35:26 GMT+0100
Date: Fri, 8 Nov 91 13:35:26 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9111081235.AA04757@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: connolly@pixel.convex.com
Subject: Re: Motif browser status
Cc: kharris@pixel.convex.com, www-talk

Dan,

Thanks for your message.  Obviously you know what you are doing with  
X11 browsers - we are impressed by what you have done to date. I was  
interested to hear that you are working on AVS - I have had some  
contact with AVS people at UNC.

You make a good point that the world has been waiting for a good  
formatted text widget under Motif. One exists under NeXTStep, Robert  
Cailliau is just adapting one for the Mac for hypertext, but under  
Motif it has been lacking.  Of course, hundreds of people have  
written them: all the word processors have them in, and products like  
dynaText, etc. However, there is none in the public domain.

CERN like Convex has a copyright on all code, but we are doing our  
best to release W3 code as widely as possible, and possibly overcome  
this limitation. Why?

The concept of the web is of universal readership. If you publish a  
document on the web, it is important that anyone who has access to it  
can read it and link to it. In order to make this possible, we don't  
need very new technology -- what we do need is

	1.	A common open naming/addressing format
	2.	Sufficiently powerful underlying protocols
	3.	Sufficiently powerful data formats
	4.	Some free implementations

Now we have defined the (1), which did not exist before. We have  
supplemented the (2), where some protocols do exist. We have added a  
little to (3) though we will use all existing and new formats. We  
have written some code.

You say your work would be of considerable valuer to convex. Yes,  
that is true. You must ask yourself whether it would be of more value  
to convex if kept private or released for general consumption. If you  
release it,

  -	Convex gets the credit and a higher profile,
	  (as Thinking Machines has with WAIS indexers for example).

  -	Anyone in the world can read the information you supply
	  with the same tool as they use for other information.

  -	You get a lot of useful feedback from users on the network

  -	A lot of people would be able to profit from what you have
	 done

You have to compare this scenario with that if you keep the code  
private. You will be able to use it internally. Would convex be able  
to profit from by selling it? If so, how many people would actually  
buy it? Will the AVS project benefit from a closed private  
documentation scheme?

On these grounds alone, you may conclude that it is in Convex's  
interest to release the code. Still, you ask what we can "put on the  
table".  If it would make it easier to justify the release of code,  
we would be happy to make all CERN-developed W3 code officially  
available to Convex under a more or less formal joint project  
agreement. Note that we are producing a parallel set of parsers and  
access mechanisms for HTML, newgroups, WAIS, prospero, etc. We have  
gateways, and other browsers. The line-mode browser you know, the Mac  
one is coming along, we may have a full-screen character grid browser  
too. We are currently unifying the browser architecture so that all  
access mechanisms can be used by all browsers. I'm not sure that  
either of our sides would want to be contractually bound to produce  
or maintain anything - the agreement would be just as-is code sharing  
of what exists when it exists, no strings.

You ask about graphics. That cannot be our next priority, as we need  
to get the new architecure and general format negociation worked out.  
In many cases, we find that there are GIF/TIFF viewers on various  
platforms, and one can link in to them. We don't want to make a new  
graphics file format a la Mac/PICT, but we are intrerested in  
conversion code. Have you heard of editable Postscript? That might be  
what you are looking for. (See  
http://info.cern.ch/hypertext/Standards/PostScript/IPF.html)

I don't know whether your company has a mechanism for allowing code  
to be released into the public domain (or General Public License). If  
it is politically impossible, then that's a pity.  (We do have a  
group of students in Finland working on an X implementation, and if  
that doesn't work out we could write it ourselves. It may also be  
that more that one implementation with a different style will be  
interesting. Obviously it would be rather a duplication of effort,  
though we are under a lot of pressure from our management and users  
to put this at the top of the agenda.)

I hope I have clarified the W3 team's philosophy, and perhaps  
convinced you to contribute, to our mutual (and the world's) benefit.

	Tim

PS: Yes, I think you ought to be on www-talk, Dan. I'll put you on.  
The traffic is not too high.
__________________________________________________________
Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web project                (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155








From emv@cato.aa.ox.com  Mon Nov 11 11:52:03 1991
Return-Path: <emv@cato.aa.ox.com>
Received: from cernvax.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA00223; Mon, 11 Nov 91 11:52:03 GMT+0100
Received: by cernvax.cern.ch (5.57/Ultrix2.0-B)
	id AA04235; Mon, 11 Nov 91 11:20:43 +0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA07388; Mon, 11 Nov 91 08:18:22 +0100
Received: from cato.aa.ox.com by mcsun.EU.net with SMTP;
	id AA19550 (5.65a/CWI-2.123); Mon, 11 Nov 1991 08:22:42 +0100
Received: by cato.aa.ox.com (/\==/\ Smail3.1.22.1 #22.9)
	id <m0kgVyd-000Bt4C@cato.aa.ox.com>; Mon, 11 Nov 91 02:22 EST
Message-Id: <m0kgVyd-000Bt4C@cato.aa.ox.com>
To: www-interest@nxoc01.cern.ch
Cc: archive-index@cs.toronto.edu, emv@msen.com
Subject: some of the stuff on ftp.cs.toronto.edu:/pub/emv/ is in WWW format
Date: Mon, 11 Nov 91 02:22:35 -0500
From: Edward Vielmetti <emv@ox.com>
X-Mts: smtp

I'm slowly but surely converting the files on ftp.cs.toronto.edu:/pub/emv
to be in the WWW format.  Right now the stuff in news-archives.README is
referred to that way, and some of the rest of the things in news-archives too.

I'm going out on a limb a tiny bit and writing references to things that
none of the clients know how to deal with yet, in the expectation that
useful data will inspire code.  In particular, I have some references
that look like

<a name=1 href=aftp://anonymous@ftp.cs.toronto.edu:/pub/emv/news-archives.README>
</a>

This aftp: tag is new.  I'm not completely happy with the use of the
file: tag to refer to remote files, since it can lead to situations
where references are ambiguous depending on whether you're dealing with
a file on the local system or that same file accessed via anonymous FTP
on the local system.  Adding an aftp: tag should help that.  The format
//user@host:/filename/ is quite similar to that used by ange-ftp, so
these references are immediately quite usable by existing code.
There's also the hope that if the aftp: thing gets to be popular it'll
be easier to pick out references to files from usenet postings, distinguishing
them from references to ftp (the protocol or the program).

It's useful (even necessary) to include the anonymous@ bit; there are some
sites (lib.stat.cmu.edu and research.att.com) with two parallel
"anonymous FTP" trees that have different user names to get to them;
a reference to
	<a href=aftp://netlib@research.att.com:/> </a>
is quite different than
	<a href=aftp://anonymous@research.att.com:/> </a>

comments etc welcomed.

at some point this archive is going to migrate back to ftp.msen.com,
but I'm waiting there on getting equipment a little more suitable to
the task.  

Tim, feel free to glue these into the web as best you see fit; I have
to go back and stick in all of the WAIS newsgroup mappings that
I collected before.
I'm also using
	<a href=wais://wais.domain.org:210/database?>
in anticipation of that tag being supported, it should be a matter of
a simple sed or perl script to convert those tags to their current
preferred format.

--Ed

From jfg@bernd.cern.ch  Tue Nov 12 16:44:43 1991
Return-Path: <jfg@bernd.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA01728; Tue, 12 Nov 91 16:44:43 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA14518; Tue, 12 Nov 91 16:36:14 +0100
Received: by bernd.cern.ch (AIX 3.1/UCB 5.61/4.03)
          id AA05532; Tue, 12 Nov 91 16:36:46 -2300
Date: Tue, 12 Nov 91 16:36:46 -2300
From: jfg@bernd.cern.ch (Jean-Francois Groff)
Message-Id: <9111131536.AA05532@bernd.cern.ch>
To: Edward Vielmetti <emv@ox.com>
Cc: www-interest@nxoc01.cern.ch
Subject: Re: some of the stuff on ftp.cs.toronto.edu:/pub/emv/ is in WWW format
References: <m0kgVyd-000Bt4C@cato.aa.ox.com>

>>>>> On Mon, 11 Nov 91 02:22:35 -0500, Edward Vielmetti <emv@ox.com> said:

Ed> I'm slowly but surely converting the files on ftp.cs.toronto.edu:/pub/emv
Ed> to be in the WWW format.  Right now the stuff in news-archives.README is
Ed> referred to that way, and some of the rest of the things in news-archives too.

I just tried to read your news-archives.README with the line-mode
browser through the traditional file: access. First-minute comments :

- Currently, any file retrieved through the file: access, local or
remote, is considered a plain text file unless its name ends with
`.html'. As a consequence, the anchors that you have inserted in
news-archives.README are not interpreted by the browser, so they
cannot be jumped to, except by cutting the reference and pasting it to
another www command line. Moreover, the text is just echoed in its
original format, which sadly happens to be double-spaced (CR-LF ?).
The easy fix is to append `.html' to the name of any file that
contains HTML tags, but I understand that it will bother people who
look at your files without www. The upcoming format negociation could
help with this, especially in the case of a dedicated www server that
could pass and possibly negotiate the document type. For anonymous
ftp, the browser should run simple heuristics to try and guess the
type of the file from its name.extension. We'll think about it.

Ed> This aftp: tag is new.  I'm not completely happy with the use of
Ed> the file: tag to refer to remote files, since it can lead to
Ed> situations where references are ambiguous depending on whether
Ed> you're dealing with a file on the local system or that same file
Ed> accessed via anonymous FTP on the local system.  Adding an aftp:
Ed> tag should help that.

- We agree that the current syntax can be ambiguous, but we want to
keep references to local and remote files in the same format, because
the very notion of a `remote' file should disappear with wide-area
hypertext (remember the new WAN cliche: the network IS the computer).
A less philosophical reason for that is to avoid referring to a
particular retrieval protocol : the reference to the file should be
the same regardless of whether it is retrieved through anonymous ftp
or through the Andrew file system, for instance. Of course, we would
like to introduce X.500 naming in the (more or less) long term.

Ed> It's useful (even necessary) to include the anonymous@ bit; there are some
Ed> sites (lib.stat.cmu.edu and research.att.com) with two parallel
Ed> "anonymous FTP" trees that have different user names to get to them;
Ed> a reference to
Ed> 	<a href=aftp://netlib@research.att.com:/> </a>
Ed> is quite different than
Ed> 	<a href=aftp://anonymous@research.att.com:/> </a>

So we want to keep `file:' for both local and remote file, but we must
take into account your other suggestion : allowing for a different
user name. I suggest the following :

	* allow an optional `user@' part before a host name.
	* if the user is not specified, make it the current user name
	  if the host is the local machine, and `anonymous' otherwise.
	  (this avoids the ambiguity that you mentioned)

Examples :
	file://ftp.cs.toronto.edu/pub/emv/news-archives.README.html
	file://netlib@research.att.com/

Ed> The format //user@host:/filename/ is quite similar to that used by
Ed> ange-ftp, so these references are immediately quite usable by
Ed> existing code.

- Currently, a colon after the host name is used to specify an alternate
TCP port number, but a good browser should ignore it if no number is
present. In this way, www can be compatible with ange-ftp syntax.

- Your examples make me think of another feature we should add for the
browsers to support them : the ability to display a directory as a
list of references, with maybe the README file (if any) prepended as
introductory text. Currently, on your reference to
	file://pit-manager.mit.edu/pub/usenet/
the browser would try to `get' the directory through ftp and fail. So
I'll add this to the wish-list for the `file:' access method :

	* if the address ends with a `/', try `ls' instead of `get'.
	* try to get an appropriate README file. Try those in order :
	  README.html, *README*.html, README, *README*, *readme*
	* Display that file if found, then build a list of references
	  for all the files contained in the directory.

Note that if you supply both a README.html and a traditional README,
you won't have to apologize about `all those funky angle brackets' !

- From your news-archives.README :

  blah blah blah. Check out
  <a href=aftp://anonymous@pit-manager.mit.edu:/pub/usenet/> </a>
  for lots more information.

With the line-mode browser, this will look fine :

  blah blah blah. Check out [1] for lots more information.

But with any mouse-driven browser (NeXT, X-Windows, emacs, Mac), the
anchor should sit on a piece of text that will serve as a button. With
your current example, your reader would only see :

  blah blah blah. Check out for lots more information.

with possibly a tiny highlighted space between `out' and `for'. Some
human-readable description of what the anchor points to will do fine.
For instance :

  blah blah blah. Check out the
  <a href=file://pit-manager.mit.edu/pub/usenet/> MIT usenet archives
  </a> for lots more information.

would yield

  Check out the MIT usenet archives[1] for lots more information.

or a highlighted `MIT usenet archives' on a mouse-driven browser.
Before that in your README, it would be nice to have an anchor
associated with the `List of periodic informational postings' and to
the archive that you mention. Same for the `news.answers' group (the
`news:' access is implemented in the new architecture. Use this simple
syntax : `<a href=news:news.answers> news.answers </a>'.)

- As an aside, the `name=' part of the anchor tag is not necessary in
your context : it is needed if someone wants to make a link TO that
particular anchor, not to the whole document.

Ed> I'm also using
Ed> 	<a href=wais://wais.domain.org:210/database?>
Ed> in anticipation of that tag being supported, it should be a matter of
Ed> a simple sed or perl script to convert those tags to their current
Ed> preferred format.

- Agreed. OK for the `wais:' access.

Thank you for all your suggestions. Please continue to provide
feedback as you write more html. We're looking forward to read your
data seamlessly and pave the way for other ftp site managers.

--- Jean-Francois

From emv@shelley.aa.ox.com  Wed Nov 13 01:11:01 1991
Return-Path: <emv@shelley.aa.ox.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA02121; Wed, 13 Nov 91 01:11:01 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA03747; Wed, 13 Nov 91 01:02:22 +0100
Received: by shelley.aa.ox.com (/\==/\ Smail3.1.22.1 #22.9)
	id <m0kh855-000Ds7C@shelley.aa.ox.com>; Tue, 12 Nov 91 19:03 EST
Message-Id: <m0kh855-000Ds7C@shelley.aa.ox.com>
To: jfg@bernd.cern.ch (Jean-Francois Groff)
Cc: www-interest@nxoc01.cern.ch
Subject: Re: some of the stuff on ftp.cs.toronto.edu:/pub/emv/ is in WWW format 
In-Reply-To: Your message of Tue, 12 Nov 91 16:36:46.
             <9111131536.AA05532@bernd.cern.ch> 
Date: Tue, 12 Nov 91 19:03:47 -0500
From: Edward Vielmetti <emv@ox.com>
X-Mts: smtp

>> I just tried to read your news-archives.README with the line-mode
>> browser through the traditional file: access. First-minute comments :

Thanks for the comments.  I realize this is just a first pass for some
of this -- I'm hand editing the files for now, but before too long I
really want to start generating stuff more automatically, best to get
the formats down pat before writing code.

>> The easy fix is to append `.html' to the name of any file that
>> contains HTML tags, but I understand that it will bother people who
>> look at your files without www. 

I'm expecting to generate two (or three, or n) different files
eventually from an SGML source; one will be relatively flat ASCII that
people can read real easily, another will be nice pretty postscript
suitable for paper, and the third the HTML for the browser.  I'm
pretty sure that the available SGML tools (either now or within the
year) will make this reasonable to do, one way or the other.

>> - We agree that the current syntax can be ambiguous, but we want to
>> keep references to local and remote files in the same format, because
>> the very notion of a `remote' file should disappear with wide-area
>> hypertext (remember the new WAN cliche: the network IS the computer).

I guess the only problem here is that the frame of reference (or the
top level directory) may change depending on your access mode;
anonymous FTP shows that tendency, and AFS seems to as well.  I've
abandoned the aftp: bit as I rewrite things, they're just file: now.

>> - Currently, a colon after the host name is used to specify an alternate
>> TCP port number, but a good browser should ignore it if no number is
>> present. In this way, www can be compatible with ange-ftp syntax.

Thanks.  I think it's important -- ange-ftp users includes me, and
since I don't have a real super WWW browser other than line mode I
need to be sure that I don't have to rewrite stuff.  I don't think it
would be to hard to cons up a similar setup to 

>> I'll add this to the wish-list for the `file:' access method :
>> 
>> 	* if the address ends with a `/', try `ls' instead of `get'.
>> 	* try to get an appropriate README file. Try those in order :
>> 	  README.html, *README*.html, README, *README*, *readme*
>> 	* Display that file if found, then build a list of references
>> 	  for all the files contained in the directory.

There's work going in the IETF Anonymous FTP working group (headed up
by Alan Emtage and Peter Deutsch of archie fame) to work on improving
access to anonymous FTP areas.  A standard for directory description
is sorely lacking, and I think (cross fingers) than an SGML approach
like WWW would have as good a chance as any to get acceptance That's
especially true, *if* can be generated with minimal or no effort by a
site admin.  I'm inclined to called the file
	archie.html
just to steal their good name :-) and make it clear that the file is
designed to be scooped up and processed by other things (future
archies, WWW, WAIS, other hypertext browsers, other indexes).

A first pass would be to take a big archive that you're familiar with
and that is already reasonably well indexed (say the index files from
one of the NeXT archives, or maybe simtel20, or something like that)
and convert the indexes into WWW format.  

>> With the line-mode browser, this will look fine :
>> 
>>   blah blah blah. Check out [1] for lots more information.

Fixed (more or less)in the stuff that I'm going back over.  I don't
have a formatter just yet that will display things as they will show
on-screen, & there are style and design conventions involved which I'd
really rather steal from someone than do myself.  A style guide for
html (and a dtd, if you can manage one, so that these things can be
munged with sgml tools) would be great to have.

>> Thank you for all your suggestions. Please continue to provide
>> feedback as you write more html. We're looking forward to read your
>> data seamlessly and pave the way for other ftp site managers.

Happy to be of help, thanks for the comments.  It's a big enough job
to try to map out what's out there on the net, I'd just as soon let
someone else write the nice GUI so I don't have to.

-- 
Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com
       MSEN, Inc. 628 Brooks Ann Arbor MI 48103 +1 313 741 1120



From emv@crane.aa.ox.com  Wed Nov 13 05:18:38 1991
Return-Path: <emv@crane.aa.ox.com>
Received: from cernvax.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA02197; Wed, 13 Nov 91 05:18:38 GMT+0100
Received: by cernvax.cern.ch (5.57/Ultrix2.0-B)
	id AA09904; Wed, 13 Nov 91 05:15:00 +0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA08967; Wed, 13 Nov 91 05:09:56 +0100
Received: by crane.aa.ox.com (/\==/\ Smail3.1.22.1 #22.9)
	id <m0khBzh-00081pC@crane.aa.ox.com>; Tue, 12 Nov 91 23:14 EST
Message-Id: <m0khBzh-00081pC@crane.aa.ox.com>
To: www-interest@nxoc01.cern.ch
Subject: references in the web to paper documents.
Date: Tue, 12 Nov 91 23:14:28 -0500
From: Edward Vielmetti <emv@ox.com>
X-Mts: smtp

I will be using the format

<a href=isbn:0-13-484080-1> Carl Malamud's "Stacks" </a>

to handle references to books.  The hope (such as it is) is that
a browser will be able to take the isbn magic cookie and feed it
into a library on-line catalog and get a meaningful result back.

If there has been an SGML coding proposed or in use for MARC format
records that would be the appropriate way to return the results.
I don't have MARC details on-line, but that's OK since most library
on-line catalogs don't yet give you access to raw cards.

Until there's an isbn-to-www gateway they're still quite useful
as absolute reference markers, easy to get the full cataloging
information that way.

Similar treatment is expected for issn (serials) numbers.  In some
distant far-off future electronic serials and electronic documents
will get card catalog entries for them if they're suitably permanent
and distinctive to warrant them.  Until then there are plenty of books
out there that I'd like to have pointers to.

Bonus points if you can deliver fully formed hypertext to the
desktop based on the isbn number :-)

--Ed

From emv@heifetz.msen.com  Wed Nov 20 10:30:27 1991
Return-Path: <emv@heifetz.msen.com>
Received: from cernvax.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA08265; Wed, 20 Nov 91 10:30:27 GMT+0100
Received: by cernvax.cern.ch (5.57/Ultrix2.0-B)
	id AA11177; Wed, 20 Nov 91 10:26:56 +0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA01645; Wed, 20 Nov 91 10:21:09 +0100
Received: by heifetz.msen.com (/\==/\ Smail3.1.22.1 #22.11)
	id <m0kjoAF-000HftC@heifetz.msen.com>; Wed, 20 Nov 91 04:24 EST
Message-Id: <m0kjoAF-000HftC@heifetz.msen.com>
To: www-talk@nxoc01.cern.ch
Cc: archie-maint@cc.mcgill.ca, prospero@isi.edu
Subject: prototype of www-prospero-archie interface
Date: Wed, 20 Nov 91 04:24:09 -0500
From: Edward Vielmetti <emv@msen.com>

What I say prototype, I mean just a little teeny tiny idea
turned into a few lines of code.

This is a piece of perl that is a gateway between archie and
WWW.  It should be set up to run as a server under inetd.  It
takes incoming requests of the form
	GET /nic.funet.fi/exact?wais
and returns HTML formatted archie results back.  It depends on the
C language Prospero archie client that you can get from (e.g.)
ftp.cs.widener.edu.

My HTML is really bad but you get the idea, it should look somehow
somewhat reasonable and something you can click on.

With a little bit of work any dbm file looks like it can be turned
into a WWW queryable server along the same lines.

Take note that you are inviting truly perverse packet transport,
with a query from one site resutling in hundreds of packets being
shuttled all around the world ....

--Ed

#!/usr/local/bin/perl

# gateway from www to archie
# this is the "brute force" kind of approach; a tidier solution
# would speak the prospero protocols directly.

while (<>) {
	if (m,^GET /(.*)/(.*)\?(.*)$,) {
		$archie = $1;
		$type = $2;
		$query = $3;
	} else {
		exit 0; # XXX 
	}
#	print "$node $database $query \n";
	if ($type eq 'exact') {
		$arg = " -e ";
	} else {
		$arg = " -s ";
	}
	$archcmd = "archie -l -t " . $arg . " -h " . $archie . " " . $query ;
	print "<title>archie $type search for $query on $archie </title>\n";
	@result = `$archcmd`;
	foreach (@result) {
		($time, $size, $host, $file) = split;
		print "<a href=file://anonymous@$host:$file>\n";
		print "$_</a>\n";
	}
}


From timbl  Thu Nov 21 17:51:35 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA11536; Thu, 21 Nov 91 17:51:35 GMT+0100
Date: Thu, 21 Nov 91 17:51:35 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9111211651.AA11536@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Anders Gillner <awg@sunic.sunet.se>
Subject: Re: Internet-gopher , WWW, WAIS, etc
Cc: NIR, www-talk

Anders,

>> systems within systems within systems!

> [expletive deleted] mess !!. I have written to Joyce R. and said  
that we need some kind of structure and a worldwide cooperation about  
datastructure.

There two areas -- orgainzing the data structure itself, and  
coordinating addressing/protocols/formats. Both are in embryonic  
stages at the moment so a few concurrent ideas must be useful to the  
world. Both can also, to a certain extent, be resolved by a "survival  
of the fittest" principle (as Brewster argues in [1]): Those reviews,  
overviews and indexes which have the best coverage, signal to noise  
ratio and good links will be read most, and quoted most. It would  
save time, though, if all the projects pooled resources a bit more.  
Some unaligned funding would help of course...

 Various people (CC'd on this mail) are talking about putting  
together a mailing list about resolving technical issues. Personally,  
I don't mind where the list is -- it sounds like a good idea. The  
problem seems to be working out the overlap with lists such as  
www-talk, wais-talk, archie-people, the anon. ftp IETF WG., various  
public lists [2], etc etc.  Perhaps someobody could try to make a  
"state of the nation(?)" list of who's doing what now.

	Tim

__________________________________________________________
Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web project                (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155


References:

[1]Brewster Kahle on WAIS concepts (much applies to other systems  
too)
 www file://quake.think.com//pub/wais/doc/wais-concepts.txt
[2] List of some lists involved in NIR:
www http://info.cern.ch:8001/wais.cic.net:210/lists network  
information retrieval



From timbl  Mon Nov 25 09:48:57 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA13665; Mon, 25 Nov 91 09:48:57 GMT+0100
Date: Mon, 25 Nov 91 09:48:57 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9111250848.AA13665@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: connolly@pixel.convex.com
Subject: Re: X/motif browser status
Cc: www-talk

Dan,

> The AVS help system is all but finished.

> I went back tonight and tried browsing WWW files.
> My html2rtf converter is still a little rusty, but
> other than that, it works well.

Thanks for the pearl script. It prompted us to bring perl up on some machines
we didn't previously have it on.

> Features I've implemented since I last wrote you:
>
>	* multiple fonts (with menu options for changing them)
>	* colored text
>	* full color raster images
>
> I've been considering the idea of building this thing on a Sparc
> and sending you the binary for evaluation. I _might_ have time
> before the end of the year. Would you have time to look at it?
> Is it worth bothering, considering we don't have an agreement
> about the source?

Certainly we we would have time, and we'd be interested to compare the user  
interfaces.  It's a pity you can't release the source, but to to see the look and  
feel would be intresing all the same.

> I used to keep up on the hypertext newsgroups, WWW, and WAIS mailing
> lists, but for about two months now, I've been too busy. The AVS project
> is winding down. I should be able to start thinking about using WWW or
> WAIS technology (or both) some time soon.

Good. We are still (with many distraction) working toward getting the new browser  
out. That'll be able to have WAIS and WWW in the same package, fairly seamless  
[Have you tried browsing through the WAIS gateway with your WWW browser?]

Keep up the good work

Tim

From timbl  Thu Nov 28 08:57:45 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA16501; Thu, 28 Nov 91 08:57:45 GMT+0100
Date: Thu, 28 Nov 91 08:57:45 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9111280757.AA16501@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: www-talk
Subject: Document identifiers



[from clifford lynch via brewster kahle]

The Coalition for Networked Information
Architectures & Standards Working Group

Workshop on ID and Reference Structures
for Networked Information

There is an increasingly urgent need to develop working
standards for referencing networked information objects.
This has a wide range of applications, including links from
MARC records to source material, references from courseware
to published material in electronic form, networked
hypertext pointers, and digital document IDs of the sort
used in the Wide Area Information Server (WAIS) system.
Many projects underway today need these types of
identifiers, and a number of efforts have developed ad-hoc
solutions so that they can progress. Unfortunately, the
proliferation of these ad-hoc solutions is a major barrier
to interoperability.

Responding to this need, the Coalition for Networked
Information's Architectures and Standards Working Group is
initiating an effort to develop such a working standard, or
agreement. One outcome of this work may be a draft
specification that is forwarded to standards-making bodies
such as the National Information Standards Organization for
consideration as the basis of an actual standard. In
addition, the resulting specification may be submitted to
the Internet Engineering Task Force for consideration as a
draft Request for Comment (RFC).

I propose the following process to reach agreement. I am
distributing this announcement, which includes a number of
assumptions towards such a specification; redistribution is
encouraged. Discussion can be carried out electronically on
the new LISTSERV mailing list that has been set up for the
Architectures and Standards Working Group, which you can
subscribe to by sending a mail message in the form

SUB CNI-ARCH yourname

to

LISTSERV@UCCVMA.BITNET

Barring the unlikely event that rapid and full agreement on
the specification is reached through electronic discussion,
CNI will sponsor a one-day invitational meeting in early
November (date and place to be determined). If you have a
strong interest in this topic and feel you should attend the
meeting, contact me either by electronic mail
(CALUR@UCCMVSA.BITNET or CALUR@UCCMVSA.UCOP.EDU) or by
telephone (510) 987-0522 to have your name added to the
invitation list.

Aspects of the problem that need to be addressed include
those below, which I have listed along with some assumptions
(all subject to question) to provide a starting point for
our discussions. I do not claim that this list is complete;
look for areas overlooked as well as react to those
mentioned. Many people have contributed ideas that appear in
the list below, but I must make special note of the
contributions of Brewster Kahle of Thinking Machines and his
excellent document "Document Identifiers, or International
Standard Book Numbers for the Electronic Age" (5/9/90).

1. The need for identifiers, as distinct from location
information. This is best handled by a number (much like an
ISSN or ISBN), but the system must accomodate multiple
number-assigning agencies. Thus, the identifier is proposed
as <numbering-authority>,<identifier> where numbering
authorities are registered.

2. The pointers must be representable as an ASCII string to
facilitate inclusion in a wide range of material, including
documents and electronic mail.

3. Location information must support multiple Locations for
the document, including the "location of record" and one or
more redistribution centers, local caches, etc. The means of
specifying a location should be sufficiently general to span
at least the set of networks covered under the Internet
Domain Naming system (DNS).

4. Objects may be retrieved by a variety of access
mechanisms from servers, including FTP, LISTSERV, Z39.50,
and perhaps FTAM and SQL-based database access, as well as
requests for paper copies. The location information should
be sufficiently general to include information about these
different types of access techniques, and extensible to
include new access methods that may develop in future.

5. Perhaps the location identifier should include some
information about the format and size of the object; on the
other hand, perhaps it should not. Discussion?

6. It should be possible to further qualify a reference to a
"sublocation" within an object (which would have meaning
only to the server that houses it). This is needed, for
example, for hypertext-type links.  Such a sublocation might
be the 25th paragraph of a text, for a hypertext-type
pointer.

7. Indirection should be supported. In other words, one
should be able to format the location as the name of a
server that can be passed the identifier and which would
return location information. The protocol mechanism(s) for
doing this need to be specified as well.

8. While full rights and permissions data would seem to be
outside the scope of such a pointer, it might be useful to
include at least some basic information. This might be an
indication that the object is not copyrighted and can be
freely distributed, that it is copyrighted but can be freely
distributed, that it can be redistributed for noncommercial
use, or that restrictions apply to redistribution. Also, it
might make sense to include a pointer of some sort (an
e-mail address? a host address?) for further information
about rights.

9. Perhaps there might be some type of checksum that can be
calculated on the retrieved object to ensure that the
pointer and the object have not gotten out of synch?



From timbl  Thu Nov 28 11:32:02 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA16716; Thu, 28 Nov 91 11:32:02 GMT+0100
Date: Thu, 28 Nov 91 11:32:02 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9111281032.AA16716@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: www-talk@nxoc01.cern.ch
Subject: misc. architecture notes



Begin forwarded message:

To: timbl@nxoc01.cern.ch
Subject: misc. architecture notes
Date: Wed, 27 Nov 91 15:06:02 CST
From: connolly@pixel.convex.com

[Any minute now, my ride to Kansas City for the holidy
 will arrive. In the mean time, here are some ideas.]

WAIS

It's beginning to look like you should try to fit WWW inside
WAIS, rather than the other way around. You need to talk with
those guys about format negotiation and document representation,
and both groups need to combine WAIS docid's and WWW anchor
addresses.

In other words, I think the WWW browser should be a WAIS client.
But come to think of it, there's no reason a browser can't be a WAIS
client, a HTTP client, an FTP client, and an ARCHIE client all at
the same time.

For example, I used to compile WWW support into my browser. Lately,
I changed my mind. Now I compile a separate programe that supports
WWW access. I invoke
	htaccess HTML_ADDRESS
and the stdout of that process is the HTML content of the node.
I pipe that through html2rtf.pl, and display the output. The user clicks
on anchors, and the whole process repeats.

I could, however, use waisq, or an archie client, or an nntp client, or
an ftp client in place of htaccess, write a few more foo2rtf converters,
and support all this stuff. Hmmm... lots to think about.

TEXT OBJECT

I've been reading some of the design notes in your web, and I
was particularly interested in your ideas for a portable text
object. My software uses many of these concepts. I gave up
editing capabilities to simplify the design and make it doable
in two months.

I think you would be crazy to try to do the text object without C++.
Perhaps you could provide a C interface and a sample implementation
in C that doesn't have all the features. But for WYSIWYG displays,
the problem is just too complex to maintain in C.

You should take a close look at TMLib. Some of the implementation
needs rework, but the architecture fits your needs pretty well.
I'm not using any of that code, but I'm using lots of their ideas,
e.g. the model-format-view architecture.

HTML

You need a DTD. Have you seen the SGMLS tools? They parse SGML and
write a line-oriented representation as output. This would be ideal
for format negociation. You could support plaintext and cerainly RTF,
and probably make stabs at TROFF, TeX, and perhaps PostScript.

Have you considered how to embed links in other formats? Please let
me know how you decide to do it in RTF. My idea is to translate:
	<A HREF=foo>text</A>
		to
	{\field{\fldinst HREF=foo}{\fldrslt text}}
[for implementation reasons, I'm currently putting the \fldinst group
 after the \fldrslt group, but that's a minor detail.]

The resulting files still work when loaded into MS Word, though if you
saved them again I doubt the HREF would still be there.

[my ride is here. more later]

Dan



From jfg@bernd.cern.ch  Mon Dec  2 10:07:44 1991
Return-Path: <jfg@bernd.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA19808; Mon, 2 Dec 91 10:07:44 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA25300; Mon, 2 Dec 91 09:58:00 +0100
Received: by bernd.cern.ch (AIX 3.1/UCB 5.61/4.03)
          id AA14185; Mon, 2 Dec 91 10:08:04 -2300
Date: Mon, 2 Dec 91 10:08:04 -2300
From: jfg@bernd.cern.ch (Jean-Francois Groff)
Message-Id: <9112030908.AA14185@bernd.cern.ch>
To: www-talk@nxoc01.cern.ch
Subject: forwarded message from connolly@pixel.convex.com

WWW folks may like to comment on this, posted to wais-talk and
cni-arch... Sorry if you've already read it there !

-- Jean-Francois

------- Start of forwarded message -------

From: connolly@pixel.convex.com
To: wais-talk@Think.COM
Cc: cni-arch@uccvma.BITNET
Subject: Re: Document identifiers 
Date: Mon, 02 Dec 91 01:32:36 CST


>The Coalition for Networked Information
>Architectures & Standards Working Group
>
I don't like the direction this technology is headed.

What is the desired functionality of these identifiers?

If you want an identifier that uniquely identifies a file,
why not use a checksum, such as returned by the unix
sum command?

Let's see how a checksum solves these issues, and then see
what functionality I'd like to see in stead.

>1. The need for identifiers, as distinct from location
>information. This is best handled by a number (much like an
>ISSN or ISBN), but the system must accomodate multiple
>number-assigning agencies. Thus, the identifier is proposed
>as <numbering-authority>,<identifier> where numbering
>authorities are registered.
>
There's no location info in a checksum. Done deal.

>2. The pointers must be representable as an ASCII string to
>facilitate inclusion in a wide range of material, including
>documents and electronic mail.
>
Check.

>3. Location information must support multiple Locations for
>the document, including the "location of record" and one or
>more redistribution centers, local caches, etc. The means of
>specifying a location should be sufficiently general to span
>at least the set of networks covered under the Internet
>Domain Naming system (DNS).
>
Ah! Now we want to be able to get location info out of the
identifier. Checksums don't help. Well, in fact, they help
no more or less than <numbering authority>-<id> helps, unless
a numbering authority implies a location. I'm not clear on
this at all.

>4. Objects may be retrieved by a variety of access
>mechanisms from servers, including FTP, LISTSERV, Z39.50,
>and perhaps FTAM and SQL-based database access, as well as
>requests for paper copies. The location information should
>be sufficiently general to include information about these
>different types of access techniques, and extensible to
>include new access methods that may develop in future.
>
Hmmm... now it looks like the doc id should tell how to
get the document... but not exactly. What we're relly looking
for is some client software that interprets these numbers
and queries servers. Checksums look as good as anything again.

>5. Perhaps the location identifier should include some
>information about the format and size of the object; on the
>other hand, perhaps it should not. Discussion?
>
Checksums do not contain type/size info. If that's what we want,
the checksum idea is no good.

>6. It should be possible to further qualify a reference to a
>"sublocation" within an object (which would have meaning
>only to the server that houses it). This is needed, for
>example, for hypertext-type links.  Such a sublocation might
>be the 25th paragraph of a text, for a hypertext-type
>pointer.
>
Now we raise the question: just what does a document identifier
identify? Until this item, it appeared that a document was
a file. Now it's not so clear. Perhaps a document should be anything
from a single character to a paragraph to a file to a chapter to
a book to an encyclopedia to a library. That would be a good trick.
Is that what we're after?

>7. Indirection should be supported. In other words, one
>should be able to format the location as the name of a
>server that can be passed the identifier and which would
>return location information. The protocol mechanism(s) for
>doing this need to be specified as well.
>
Ah. Now the objectives of the location info become more clear.
Sounds to me like the location is a TCP connection, or enough
information on how to establish one.

>8. While full rights and permissions data would seem to be
>outside the scope of such a pointer, it might be useful to
>include at least some basic information. This might be an
>indication that the object is not copyrighted and can be
>freely distributed, that it is copyrighted but can be freely
>distributed, that it can be redistributed for noncommercial
>use, or that restrictions apply to redistribution. Also, it
>might make sense to include a pointer of some sort (an
>e-mail address? a host address?) for further information
>about rights.
>
Ack! This stuff seems totally orthogonal to the rest of the
stuff, but in practice, this looks like a crucial issue.
I don't have any good ideas here.

>9. Perhaps there might be some type of checksum that can be
>calculated on the retrieved object to ensure that the
>pointer and the object have not gotten out of synch?
>
This is what sparked the checksum idea.


My response to all this:

I don't think we need [yet another] document identifier format.
If you want location info, use an internet address; if you want
data integrity, use a checksum; if you want format, we are lacking
a standard here; if you want copyright info, ditto;

What we need is some nifty client software to glue all the parts
together. I guess there is some room for standardization, but please:
	LET'S LEVERAGE EXISTING SYSTEMS!

Where these systems are robust, I think we should support them. I'd
also like to see support for ad-hoc document identifiers. Here's
an example to clarify:

I'm browsing some email, netnews, or a README file from somewhere.
I see a reference to more info:

	A full discussion of the BLURF protocol is available via
	anonymous FTP from frob.mit.edu as blurf-proto.tex
	in the directory /pub/protos.

I select some or all of that text, and I click one of the buttons
in my document retrieval tool:

	make ftp id --	extract the relevant information and display
			a well-formed identifier acceptable to some
			existing FTP client (I've heard of something
			called ange FTP. Another idea is to make
			a shell script that would do the retrieval:
ftp frob.mit.edu
cd /pub/protos
get blurf-proto.tex
			)

	make wais id --	get enough info to make a WAIS doc ID
			[scrap this unless it stabilizes]
	make WWW id --	same thing for World Wide Web HTTP addresses.
	make NNTP id --	same thing for USENET news message id's.
	make LISTSERV id --	you get the idea
			Rather than making up a new format, these id's
			are instructions to EXISTING clients to retrieve
			a document.

	verify id --	connect to the necessary server(s) and verify
			that the id references an existing document.
			Append to the id a "verification date," which
			is the last time a server acknowledged the
			existence of the document.

	get id info --	connect to the necessary server(s) and get about
			1K of miscellaneous info: document size in bytes,
			date of last modification, available formats,
			short summary, etc.

	retrieve raw --	connect and retrieve the document in whatever
			format is convenient to the server, e.g.
			a compressed tar archive of C and troff sources.

	retrieve text --	connect and retrieve the document as
			plain text [defined, e.g. as the body of an 
			RFC-822 mail message]

	retrieve... --	the user or the supporting client software
			specifies the supported information formats,
			(compression schemes, archiving formats,
			image file formats, typesetting languages)
			the client and the server hash over their options,
			[perhaps with user intervention]
			and the server sends the most desireable version
			of the document it has available.

If we add a few buttons, we begin to encompass the scope of many existing
systems:

	expand --	change the doc id to reference the "document"
			containing it. In the ftp example, rather than
			"get blurf.tex," it would have "ls."
			Click again and get "cd ..; ls."
			Obviously, this operation depends on the access
			mechanism. For WAIS documents, the expansion of
			a document is the source that contains it.

	select --	narrow the document to some of its parts. For a
			text file, select some of the characters/paragraphs
			for a WAIS source, select some of the documents.
			For a WWW node, select a neighboring node. For
			a directory, select some files.

I guess my point is, let's think about how folks are going to use this
document referencing technology, and let's see how well existing systems
meet these needs.

I guess some groups have come to the conclusion that the existing systems
don't cut it. I'm beginning to agree.

I guess we'd all agree that we should decide how we're going to use these
doc id's and let that drive the design of the format. i.e. Let's decide
on the methods of this object before we decide on its representation.

[an idea: for syntax, the WAIS folks chose LISP. What about using
something akin to RFC-822 syntax? I think it works well: define a bunch
of standard headers; require some, allow some, disregard others; allow
free-form text in the body. examples:

ISBN: 0-13-590126-X
	or
MESSAGE-ID: usenet-thing
	or
FTP-HOST:  frob.mit.edu
USER:	anonymous
	or
WAIS-PORT:	8001@think.com


This would allow us to leverage all the email technology out there, plus 
the emerging multi-part mail format.
(and it would allow me to use PERL on these beasties! :-)
]

Another thing I hope folks are keeping in mind: I don't think any one
client can meet the information-retrieval needs of everybody. We need
to support multiple platforms, for one thing. But I hope other folks are
considering using mulitple clients at the same time! I'd like to use
one slick X-windows front end to the whole ball of wax, in some ways like
emacs does for programming, and in some ways like the mac GUI does for
office-productivity applications. But I'm going to be using POST mail
servers, NNTP servers, WAIS servers, FTP servers, etc, and I don't
expect one client to do it all. The crucial trick is to make all this
intuitive and interactive, i.e. to support hypertext browsing, fulltext
retrieval, USENET news reading, and maybe email correspondence, all in
one environment. Let's get started!

Dan

------- End of forwarded message -------

From connolly@pixel.convex.com  Thu Dec  5 19:25:22 1991
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA03918; Thu, 5 Dec 91 19:25:22 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA01254; Thu, 5 Dec 91 19:14:08 +0100
Received: from pixel.convex.com by convex.convex.com (5.61/1.35)
	id AA08473; Thu, 5 Dec 91 12:19:25 -0600
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA29899; Thu, 5 Dec 91 12:16:12 -0600
Message-Id: <9112051816.AA29899@pixel.convex.com>
To: wais-talk@think.com
Cc: tcl@allspice.berkeley.edu, www-talk@nxoc01.cern.ch
Subject: documents, files, types, and access methods
Date: Thu, 05 Dec 91 12:16:11 CST
From: connolly@pixel.convex.com

Someone mentioned that WAIS should obviate the need for FTP. I disagree.
I think that the WAIS protocol is good for finding documents, but not
necesarily for transferring or displaying them.

There are two scenarios that WAIS is good for:

A. The database is built for wais. For example, DowQuest. That database is
stored so that it can be efficiently acessed and delivered through WAIS.

In this case, it makes sense to transfer the contents of the documents through
WAIS and to use the nifty chunking ideas.

B. The database is built for system X, and somebody sicked waisindex on it.
This is currently, by far the most common case. Look at all the USENET
archives, biology databases, library catalogs, etc. that weren't designed for
use with WAIS, but they work pretty well.

In this case, it makes more sense to me to transfer and/or present the
documents using the clients that the database was designed for. The WAIS server
should send enough information to retrieve and/or display the document using the
other client.

Example: the archie database. As a user, I want to query the archie
database using WAIS's fulltext and relevance feedback queries, but I want to
retrieve the documents with FTP, and I may want to "present" them with
uncompress and tar, or lpr, or ghostscript, etc.

Example: USENET news. I want to query using WAIS, but read it with my
news reader.

Example: my mail box. Query with wais, display with Xmh, Elm, mh, emacs, etc.

Retrieving the whole document with WAIS and saving it to a file is no good in
this day and age of client-server computing. The WAIS client may be on a
machine with no disk space to spare. And I may want to use the file on a
different host.

So we see that the WAIS client needs to hand off documents to other clients.
This raises the question: what information should the WAIS search client pass
to the retrieval/display slave clients, and how?

The CNI-ARCH folks are discussing a standard for document identifiers. I
think this is definitely one of the things that WAIS should pass, but it's
not the only thing.

I'm beginning to look at documents sort of like records in a relational
database. The WAIS client should negociate with the slave client what fields
they have or are interested in. An obvious representation for these records
is the RFC-822 mail message format.

Example: the archie database.

I use my xwais client to query archie.src on "vgrind." My xwais client gets a
list of docids from the WAIS server. These docids contain at least the score
and the CNI-ARCH style docid, which in this case would be enough info to
construct a prospero file handle [I'm not sure there is such a thing as a
prospero file handle, but play along anyway...].

I play gui-games with xwais until I get the list of documents that I like.
Then, using some mechanism like the X selection mechanism or drag-and-drop
(combined with SMTP, perhaps), I select a document and give it to my xftp
application. The xwais client and the xftp client agreed earlier that they
would send messages like:

From:		xwais@x.server.host
To:		xftp@x.server.host
CNI-ARCH-ID:	<12345@prospero:quiche.cs.mcgill.ca>
SIZE-IN-BYTES:	120034
FTP-HOST:	export.lcs.mit.edu
FTP-USER:	anonymous
FTP-CD:		pub/util
FTP-GET:	vgrind.tar.Z

blah blah blah about vgrind, perhaps explaining what query found this file,
or perhaps some stuff from the README in vgrind.tar.Z
.

I have already played gui-games with xftp to tell it where to put the files
it retrieves. When it gets this message, it does the HOST, USER, CD, and GET
commands, and presto! I've got my document.

I think if we had a suite of these gui tools talking SMTP to each other, they
could get a lot of work done. More examples:

To:	xtar@x.server.host
fopen:	/home/connolly/vgrind.tar
	or perhaps
popen:	zcat /home/connolly/vgrind.tar.Z

	xtar has a gui for selecting a place to extract the archive

To:	xlpr@x.server.host
fopen:	/home/connolly/vgrind-2.1/manual.ps
	or
popen:	zcat /home/connolly/vgrind-2.1/manul.ps.Z |

	xlpr selects destination printer, copies, etc.


Most tools fit in naturally. The $PAGER and $EDITOR, and perhaps $SHELL tools
could be MUCH more powerful if they could interoperate this way. [Has anybody
used mx and tx from John Osterhout(sp?) ? Those and the Tk toolkit allow X
applications to send commands back and forth.]

For example, the World-Wide-Web browser would fit the role of $PAGER in this
environment. It would receive messages to display WWW nodes, containing their
HTTP address (or NNTP, FTP, etc.). It would then display the node and allow
the user to scroll around and choose anchors etc. It could handle most
anchors by itself, but it might want to let the user select a region of text
and send it to the WAIS client.

I don't think there's an $EDITOR that fits very well, though emacs is always a
contender, and you have to have vi.
[I think the mouse support in emacs needs a LOT of work, but I probably
haven't seen the latest and greatest stuff.]

I'm not sure how $SHELL fits into all this but, for example, folks send shell
commands in mail messages to each other all the time. You could just select
the shell command in your mail $PAGER, and drag it to your $SHELL x-client
for invocation.

I hope I get time to try to implement a couple of these ideas. Then we can
all see whether they're worth persuing.

Dan

From emv@shelley.aa.ox.com  Fri Dec  6 01:33:36 1991
Return-Path: <emv@shelley.aa.ox.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA04159; Fri, 6 Dec 91 01:33:36 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA20299; Fri, 6 Dec 91 01:22:17 +0100
Received: by shelley.aa.ox.com (/\==/\ Smail3.1.22.1 #22.9)
	id <m0kpTDB-000Ds7C@shelley.aa.ox.com>; Thu, 5 Dec 91 19:14 EST
Message-Id: <m0kpTDB-000Ds7C@shelley.aa.ox.com>
To: connolly@pixel.convex.com
Cc: wais-talk@think.com, tcl@allspice.berkeley.edu, www-talk@nxoc01.cern.ch
Subject: Re: documents, files, types, and access methods 
In-Reply-To: Your message of Thu, 05 Dec 91 12:16:11 -0600.
             <9112051816.AA29899@pixel.convex.com> 
Date: Thu, 05 Dec 91 19:14:37 -0500
From: Edward Vielmetti <emv@ox.com>
X-Mts: smtp

data, data, data, data, data.

if you have good ideas about document structure and ways to send
messages around that have magic cookies in them, that's good.  but in
order to convince anyone to do anything substantial in terms of
software development you need to provide data.

get 100 entries describing 100 things that are useful, add enough
structure that a motivated party can pull your database apart and
create something new from it, and people will start to write code.
(honest.)  produce another 10 entries a month for a year and more
people will write code or bend their existing code to work with your
system.

Don't wait for an all-singing, all-dancing standard before you start
to collect information.  If you gather enough stuff and organize it
well, other people will do the work of bringing it up to what is
considered standard (if and when that happens).  You do need to be
thorough in making sure that whatever you do is consistent and regular
enough to be worth retrofitting.

-- 
Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com
       MSEN, Inc. 628 Brooks Ann Arbor MI 48103 +1 313 741 1120


From rusty@groan.berkeley.edu  Fri Dec  6 02:18:12 1991
Return-Path: <rusty@groan.berkeley.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA04188; Fri, 6 Dec 91 02:18:12 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA22015; Fri, 6 Dec 91 02:06:59 +0100
Received: by groan.Berkeley.EDU (5.57/1.41)
	id AA26777; Thu, 5 Dec 91 17:13:22 -0800
Date: Thu, 5 Dec 91 17:13:22 -0800
From: rusty@groan.berkeley.edu (Rusty Wright)
Message-Id: <9112060113.AA26777@groan.Berkeley.EDU>
To: emv@ox.com
Cc: connolly@pixel.convex.com, wais-talk@think.com, tcl@allspice.berkeley.edu,
        www-talk@nxoc01.cern.ch
In-Reply-To: <m0kpTDB-000Ds7C@shelley.aa.ox.com> "emv@ox.com"
Subject: documents, files, types, and access methods 

I don't understand what this all has to do with tcl.

From vanandel@rsf.atd.ucar.edu  Fri Dec  6 17:52:35 1991
Return-Path: <vanandel@rsf.atd.ucar.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA04815; Fri, 6 Dec 91 17:52:35 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA18576; Fri, 6 Dec 91 16:38:59 +0100
Received: from keel.atd.ucar.edu by ncar.ucar.EDU (5.65/ NCAR Central Post Office 04/10/90)
	id AA19557; Fri, 6 Dec 91 08:44:17 MST
Message-Id: <9112061544.AA04345@rsf.atd.ucar.EDU>
Received: from wabbit4.atd.ucar.edu by rsf.atd.ucar.EDU (5.65/ NCAR Mail Server 04/10/90)
	id AA04345; Fri, 6 Dec 91 08:44:01 MST
To: wais-talk@think.com, www-talk@nxoc01.cern.ch
Cc: tcl@allspice.berkeley.edu
Subject: What's "wais" and "wwv" ?
In-Reply-To: Your message of Thu, 05 Dec 91 19:14:37 -0500.
             <m0kpTDB-000Ds7C@shelley.aa.ox.com> 
Date: Fri, 06 Dec 91 08:43:57 -0700
From: vanandel@rsf.atd.ucar.edu


Some of us on the tcl mailing list feel like we came in on the middle
of a discussion and don't know the context.  Since there are apparently
mailing lists for 'wais' and 'wwv',could someone give a general description
of what "wais" and "wwv" are?

Thanks much!

	Joe VanAndel  		Internet:vanandel@ncar.ucar.edu
	NCAR / RSF  			
	P.O Box 3000		Fax:	 303-497-2044
	Boulder, CO 80307-3000	Voice:	 303-497-2071 

From welch@parc.xerox.com  Fri Dec  6 18:56:17 1991
Return-Path: <welch@parc.xerox.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA04914; Fri, 6 Dec 91 18:56:17 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA00186; Fri, 6 Dec 91 18:45:04 +0100
Received: from corvina.parc.xerox.com ([13.2.116.10]) by alpha.xerox.com with SMTP id <11798>; Fri, 6 Dec 1991 09:48:42 PST
Received: by corvina.parc.xerox.com id <36899>; Fri, 6 Dec 1991 09:46:21 -0800
Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.corvina.parc.xerox.com.sun4.40
          via MS.5.6.corvina.parc.xerox.com.sun4_40;
          Fri,  6 Dec 1991 09:46:19 -0800 (PST)
Message-Id: <IdDvRfgB0bE_87aBRm@corvina.parc.xerox.com>
Date: 	Fri, 6 Dec 1991 09:46:19 PST
Sender: Brent Welch <welch@parc.xerox.com>
From: Brent Welch <welch@parc.xerox.com>
To: connolly@pixel.convex.com
Subject: Re: documents, files, types, and access methods
Cc: wais-talk@think.com, www-talk@nxoc01.cern.ch, tcl@allspice.berkeley.edu
In-Reply-To: <9112051816.AA29899@pixel.convex.com>
References: <9112051816.AA29899@pixel.convex.com>

The model that TCL has it that each tool has an embedded TCL
interpreter, and tools can issue commands to each other in the TCL
language.  TCL is designed to be simple for simple things, and it is
fully programmable.  TCL provides basic language features, and the
application that embeds an interpreter can define new commands, either
as C procedures or as TCL command procedures (scripts).  For example,
the Tk X toolkit defines a number of TCL commands to create widgets, so
it is possible to write window programs with a script.  Since each
window (ideally) has a TCL interpreter behind it, you can control your
tools by sending around TCL commands.  This is a more powerful
alternative to using mail header formats for messages. You can send
whole programs, not just commands.  The way I use this currently is to
couple a control panel with a shell window.  The control panel is put
together as a TCL/Tk script that uses the Tk toolkit to display buttons,
etc.  Clicking on buttons in the control panel can cause messages to be
sent to the terminal emulator (tx).  One very useful command passes a
string along to the shell running in the terminal emulator.  In this way
I can create buttons that run commonly used programs.  Other commands
control the terminal emulator itself, such as its size and placement on
the screen, the message in its status line, etc.  The whole model of a
bunch of tools that have a common language and can fire off commands to
each other is very powerful.

	Brent Welch

From timbl  Mon Dec  9 15:33:46 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA06290; Mon, 9 Dec 91 15:33:46 GMT+0100
Date: Mon, 9 Dec 91 15:33:46 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9112091433.AA06290@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: wei@xcf.berkeley.edu (Pei Y. Wei)
Subject: Re: SGML/HTML docs, X Browser
Cc: www-talk

Pei,

I have added you to www-talk as requested.

> I'm now browsing through parts of the WWW distribution, and I'm  
seeing
> lots of potential in it. It seems that the only browser is for the  
NeXT,
> a platform most people don't have access to, which is a shame. 

> I'm now seriously considering writing an X11 browser for HTML files
> by extending a program I've been working on (called VIOLA, a  
program 

> somewhat like HyperCard)...

Ok.. sounds like a good idea.  Dan Conolly (Convex Inc) has put  
together a W3 browser for X but could not release the code.  A group  
of students in Finland were also going to do this for a project -- I  
don't know the status of that work.  Anyone who makes a good X11 W3  
browser will be very popular.

Now we have just got a new architecure for the browser code, with a  
generic (simple!) SGML parser, and basically all the browser code  
common (networking, name resulution, parsing) between different  
browsers. The new line mode browser is under test - it has NNTP  
access to news built in as well as HTTp and FTP access to indexes and  
files.

> I'm wondering if you could give me some pointers to the standards 

> SGML and HTML (which seems to be the HyperText extention of SGML?).

SGML is very general. HTML is a specific application of the SGML  
basic syntax applied to hypertext documents with simple structure.  
The HTML tags used are in our documentation. (If you browse to our  
test document, it has a link to its own source which you can take as  
an example.) Type

www http://info.cern.ch/hypertext/WWW/Test/test_source.html

and follow the first link to see the source.  Our code therefore has  
a simple generic SGML parser engine which handles nested tags and  
feeds a HTML parser which has hypertext-specific code in it.  That  
feeds a stream of style-changes and text and anchor start/end points  
into a hypertext object which is what we don't have under X.

> Anymore relevant documentation on SGML/HTML, tips, and whatever you 

> think may help me in my task, would be gratfully accepted.

I could make up a tar file of our alpha-test code, including the HTML  
SGML parser. Any pointers to SGML I have are in the web - not much  
public stuff. Two books are "SGML Handbook" by Charles Goldfarb, and  
"Practical SGML" by Eric van Herwijnen.

> Pei Y. Wei (wei@scam.Berkeley.EDU)
> Experimental Computing Facility, 

> University of California @ Berkeley

Thanks for your interest, welcome to the list.

	Tim



__________________________________________________________
Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web project                (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155






From timbl  Mon Dec  9 17:31:23 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA06470; Mon, 9 Dec 91 17:31:23 GMT+0100
Date: Mon, 9 Dec 91 17:31:23 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9112091631.AA06470@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Daniel J. Oberst <oberst@hitam.princeton.edu>
Subject: Re: WWW implementation of PNN
Cc: www-talk, Howard@pucc.princeton.edu, fuchs@tsar.princeton.edu


> From: Daniel J. Oberst <oberst@hitam.princeton.edu>
> Cc: Howard@pucc.princeton.edu, fuchs@tsar.princeton.edu
> 

> Tim,
> 

> Our group has been working with WWW and looking at implementing our  
locally-developed PNN (Princeton News Network) Campus Wide  
Information Service.  Howard Strauss, who designed PNN, has written  
an EXEC that takes the PNN "menu" files and converts them into WWW  
html documents.  We're working on setting up the files and finding a  
place for them so that they can be accessed from WWW.  One nice  
by-product is that we will now have a line-mode access to the  
information in PNN, something that had been requested before (PNN  
only works in 3270 mode on our 3090, VT100 or curses mode on our unix  
implementation, and with Hypercard on a Mac).
- - - - -
> Parenthetically, I thought I had seen something about a 3270  
implementation of WWW, but can't seem to locate it (*even* with  
WWW!). Is there a 3270 implementation different from the line-mode  
version?

   No, it's just the line mode version, with a few clear screen
   calls. As the v1.0 line mdoe browser can scroll backward, we
   should be able to link the PF keys to the browser's commands
   while in WWW. There's no "move-cursor-hit-enter" functionality.
 - - - - - -
> I asssume that the best way to make these available is via HHTP?   
The PNN files here are NFS exported to all machines on campus, so our  
first implementation just used local file hypertext pointers.  I  
assume to make these available, we'd need to use the hhtp tags on the  
file references. Is this what you have done at CERN with your WWW?  
We'll try and get a hhtp server up and running on a test machine and  
try it out

    Great.  If you have a set of html files, an HTTP server (The
    distributed daemon) gives faster access than using anonymous FTP.
    The documentation about setting up the daemon should be complete
    on the web -- if there are holes in it just let me know.
    We use the daemon (httpd) to publish our own documentation on W3,
    but the documenattion comming from VM is actually transcibed
    into html on the fly by a custom server. [The regular daemon
    runs in a service machine, calls an EXEC which leaves the
    html on the stack for the daemon to read off. I  could send you
    C code if you like]. This has the advantage that it takes the
    data right from the source -- it's only stored in one place --
    but the disadvantage for me that a server has to be maintained on
    the strange IBM machine ;-).

> We'll keep you posted on our progress.

   Thanks.

> One side benefit could be that other sites running the PNN code  
could use the same tools to make their info WWW-able!!

    How many other sites are running PNN? It would be great to see
    some more CWISs coming onto the web.


> FYI - I've included below a sample file and conversion that was  
created.

    Looks good. [You could chop the leading spaces on the heading, as
    w3 will center it anyway. Also, the line "
    "Move the cursor to any topic below and press Enter"
    is a little misleading for www users - you might want to
    "hand-craft" just the top level menu page to remove that,
    and possibly put links to other CWISs, etc.]


> Daniel J. Oberst
> Director, Advanced Technology and Applications
> Computing and Information Technology
> Princeton University
> 116 Prospect Avenue
> Princeton, NJ 08544-2089

    Tim Berners-Lee                       timbl@info.cern.ch
    World Wide Web project                (NeXTMail is ok)	
    CERN                                  Tel: +41(22)767 3755
    1211 Geneva 23, Switzerland           Fax: +41(22)767 7155





__________________________________________________________________
- - - - -
FILE: pnn.mainmenu

@T@          PNN - The Princeton News Network
@M@            Click ONCE on any topic below
@M@
@T@   Move the cursor to any topic below and press Enter
@T@
help dialog*@D@HELP
@A@
index node@N@Index to Information in PNN
new node@N@What's New on PNN
@A@
about inter@I@About PNN
events inter@I@Calendars and Events
org inter@I@Campus Organizations
gphone dialog*@E@Campus Phone Book@GETPHONE
service inter@I@Campus Services & Facilities
computr inter@I@Computing Resources
course inter@I@Curriculum & Course Information
@*@                                Column 2 starts here
@C@
@E@Dial-a-Fortune@FORTUNE
facstaff inter@I@Faculty & Staff Activities
library inter@I@Library Information
misc inter@I@Potpourri
police inter@I@Safety Information
student inter@I@Student & Grad Student Activities
transpor inter@I@Travel & Visitor Information
uemploy inter@I@University Employment Information
policy inter@I@University Policies & Procedures
whoswho inter@I@University Who's Who
weather node@N@Weather Forecast
----------------------------
SCREEN: PNN main menu:

           PNN - The Princeton News Network
    Move the cursor to any topic below and press Enter

HELP                                   Dial-a-Fortune
                                       Faculty & Staff Activities
Index to Information in PNN            Library Information
What's New on PNN                      Potpourri
                                       Safety Information
About PNN                              Student&GradStudent Activities
Calendars and Events                   Travel & Visitor Information
Campus Organizations                   University Employment Info
Campus Phone Book                      University Policies&Procedures
Campus Services & Facilities           University Who's Who
Computing Resources                    Weather Forecast
Curriculum & Course Information        Exit PNN

----------------------------
FILE: pnn.html

<H1>          PNN - The Princeton News Network</H1>
<H2>   Move the cursor to any topic below and press Enter</H2>
<H3></H3>
<UL>
<LI><A HREF=help.dialogww>HELP</A>
<LI><A HREF=index.node>Index to Information in PNN</A>
<LI><A HREF=new.node>What's New on PNN</A>
<LI><A HREF=about.html>About PNN</A>
<LI><A HREF=events.html>Calendars and Events</A>
<LI><A HREF=org.html>Campus Organizations</A>
<LI><A HREF=service.html>Campus Services & Facilities</A>
<LI><A HREF=computr.html>Computing Resources</A>
<LI><A HREF=course.html>Curriculum & Course Information</A>
<LI><A HREF=facstaff.html>Faculty & Staff Activities</A>
<LI><A HREF=library.html>Library Information</A>
<LI><A HREF=misc.html>Potpourri</A>
<LI><A HREF=police.html>Safety Information</A>
<LI><A HREF=student.html>Student & Grad Student Activities</A>
<LI><A HREF=transpor.html>Travel & Visitor Information</A>
<LI><A HREF=uemploy.html>University Employment Information</A>
<LI><A HREF=policy.html>University Policies & Procedures</A>
<LI><A HREF=whoswho.html>University Who's Who</A>
<LI><A HREF=weather.node>Weather Forecast</A>
</UL>



From timbl  Fri Dec 13 17:55:53 1991
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA11835; Fri, 13 Dec 91 17:55:53 GMT+0100
Date: Fri, 13 Dec 91 17:55:53 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9112131655.AA11835@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: www-interest, www-talk
Subject: WWW to SPIRES on SLACVM - Experimental
Cc: pfkeb@kaon.slac.stanford.edu (Paul Kunz)

There is an experimental W3 server for the SPIRES High energy Physics preprint  
database, thanks to Terry Hung, Paul Kunz and Louise Addis of SLAC.  It's only just  
been put up, so don't expect perfection.   With the w3 line mode browser, follow a  
link to it from our home page, then type for example

	K FIND AUTHOR KUNZ

the "FIND" is necessary at the moment, though it may change later.

	- Tim

Paul Kunz wrote a few days ago:-

   "The SLAC Library maintainer of SPIRES databases, Louise Addis, is absolutely  
delighted.   She will ask for a permanent VM service machine and finish off the  
polishing.   Things are really moving now."

   "By the way, we certainly have the impression that accessing SPIRES from www on  
a UNIX machine is faster than using a terminal logged into SLACVM.   Even a real  
3278 terminal is not as fast.   Actually, accessing CERNVM FIND via www seems  
faster than logging into cernvm and doing the same command as well."


From wei@xcf.berkeley.edu  Fri Dec 13 20:53:09 1991
Return-Path: <wei@xcf.berkeley.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA11940; Fri, 13 Dec 91 20:53:09 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA13707; Fri, 13 Dec 91 20:47:33 +0100
Received: by swindle.berkeley.edu (5.57/Ultrix3.0-C)
	id AA25342; Fri, 13 Dec 91 11:45:05 -0800
Date: Fri, 13 Dec 91 11:45:05 -0800
From: wei@xcf.berkeley.edu (Pei Y. Wei)
Message-Id: <9112131945.AA25342@swindle.berkeley.edu>
To: www-talk@nxoc01.cern.ch
Subject: X Browser

Forgot to CC...

>To: timbl@nxoc01.cern.ch
>Subject: X Browser

Hi, Tim.

Thanks for most helpful information for my research on SGML.

>Now we have just got a new architecure for the browser code, with a
>generic (simple!) SGML parser, and basically all the browser code
>common (networking, name resulution, parsing) between different
>browsers. The new line mode browser is under test - it has NNTP
>access to news built in as well as HTTp and FTP access to indexes and
>files.
>I could make up a tar file of our alpha-test code, including the HTML
>SGML parser.

Yes, I'm very interested in using that code, and do the testing...
Regarding the X browser, I was able to rig up an X11 W3 browser by 
using viola as the front end to www (I only had to make very few and 
minor changes to www.c). It's not very sophisticated at this point
(a one nite hack...), and does not much more than display the output 
of www in a scrollable text field, highlite the reference numbers for
visibility, make references and commands (Back, Help...) clickable 
or keyable, and has a few buttons corresponding to the www commands.

One thing I'd like to do soon, if I have time, is to teach the parser
about viola object descriptions, and basically embed viola objects
(GUIs & programmability) into html files.

Thanks.

-Pei

From timbl  Thu Jan  9 12:34:24 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA08666; Thu, 9 Jan 92 12:34:24 GMT+0100
Date: Thu, 9 Jan 92 12:34:24 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9201091134.AA08666@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Mark Alexander Davis-Craig <mad@merit.edu>
Subject: Re: Is there a paper which describes the www protocol?
Cc: www-talk



> From: Mark Alexander Davis-Craig <mad@merit.edu>
> 

> I was looking through the web and found information on servers and
> clients.  I saw mention in the "History" section about wanting to
> develop a good protocol for information exchange, but haven't seen a
> paper specifically about the www protocol.  Is there one?  If not,
> could you describe it in some detail?

You are right that the protocol documentation was not as good as it could
have been. I have improved it. To save you browing through the web for it,
I append to this message the information as plain text.

> I ask because we at the University of Michigan are evaluating www,
> wais, and gopher for campus-wide information delivery.
>

I have no need to tell you what our suggestion would be!  The W3 architecure
will give you (almost) everything you can get from WAIS and Gopher rolled into one.
The trick is that almost anything is representable by hypertext links and index searches. The  
Gopher menus and plain text, for example, are both special cases of hypertext.  As it is more  
work to do the job for hypertext in general, we do not yet have software to cover as many  
platforms as Gopher, for example. However, when we do, the W3 system will be more flexible.   
Running a W3 server on top of a WAIS or Gopher world in fact makes these worlds subsets of the W3  
web. The reverse is not possible because the WAIS and Gopher information models are not flexible  
enough
to encompass the W3 model.

That said, if you want an indexer we can only recommend the wais code (or NeXT code) and we do  
not yet supply (as Gopher does) an off-the shelf index server for either of those indexes yet. It  
is easy to do, however, with our generic server code.

Please keep me informed of your thinking, whether you plan to go W3 or Gopher.  If we can help  
you set up a demonstration system, then mail me.
 

>Thanks in advance.
>  -----------------------------------------------------------------
>  Mark Davis-Craig, Merit/MichNet Technical Support Consultant
>  mad@merit.edu        mad@merit.bitnet        (313)-936-2110


Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web project                (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155

_________________________________ protocol notes follow ___________


                                         The HTTP Protocol As Implemented In W3


                          HTTP AS IMPLEMENTED IN WWW
                                       

   This document defines the Hypertext Transfer protocol  (HTTP) as currently
   implemented by the WorldWideWeb initaitive software. This is a subset of the
   proposed full HTTP protocol.  No client profile information is transferred
   with the query. Future HTTP protocols will be back-compatible with this
   protocol.
   

    The protocol  uses the normal internet-style telnet protocol style on a
   TCP-IP link. The following describes how a client acquires a (hypertext)
   document from an HTTP server, given an HTTP document address .
   

Connection

   The client makes a TCP-IP connection to the host using the domain name or IP
   number , and the port number  given in the address.
   

    During development, the default HTTP TCP port number is 2784 -- this will
   change when an official port number is allocated.
   

    The server accepts the connection.
   

    Note: HTTP currently runs over TCP, but could run over any
   connection-oriented service.   The interpretation of the protocol below in
   the case of a sequenced packet service (such as DECnet(TM) or ISO TP4) is
   that that the request should be one TPDU, but the repose may be many.
   

Request

   The client sends a document request consisting of a line of ASCII characters
   terminated by a CR LF (carriage return, line feed) pair. A well-behaved
   server will not require the carriage return character.
   

    This request consists of the word "GET", a space, the document address ,
   omitting the "http:, host and port parts when they are the coordinates just
   used to make the connection. (If a gateway is being used, then a full
   document address may be given specifying a different naming scheme).
   

    The search functionality of the protocol lies in the ability of the
   addressing syntax to describe a search on a named index .
   

    A search should only be requested by a client when the index document
   itself has been descibed as an index using the  ISINDEX tag .
   

Response

   The response to a simple GET request is a message in hypertext mark-up
   language ( HTML ). This is a byte stream of ASCII characters.
   

    Lines shall be delimited by an optional carriage return followed by a
   mandatory line feed chararcter. The client should not assume that the
   carriage return will be present.  Lines may be of any length. Well-behaved
   servers should retrict line length to 80 characters excluding the CR LF
   pair.
   

    The format of the message is HTML - that is, a trimmed SGML document. Note
   that this format allows for menus and hit lists to be returned as hypertext.
   It also allows for plain ASCII text to be returned following the  PLAINTEXT
   tag .
   

    The message is terminated by  the closing of the connection by the server.
   

    Well-behaved clients will read the entire document as fast as possible. The
   client shall not wait for user action (output paging for example) before
   reading the whole of the document.  The server may impose a timeout of the
   order of 15 seconds on inactivity.
   

    Error responses are supplied in human readable text in HTML syntax. There
   is no way to distinguish an error response from a satisfactory response
   except for the content of the text.
   

Disconnection

   The TCP-IP connection is broken by the server when the whole document has
   been transferred.
   

    The client may abort the transfer by breaking the connection before this,
   in which case the server will not record any error condidtion.
   

    Requests are idempotent .  The server need not store any information about
   the request after disconnection.
   

    _________________________________________________________________
   

                                                                         Tim BL
                                                                               

                               W3 NAMING SCHEMES
                                       

   (See also: a discussion of design issues involved , BNF syntax , W3
   background)
   

    The format of a hypertext name consists of the name of the naming
   sub-scheme to be used, then a name in a format particular to that subscheme,
   then an optional anchor identifier within the document. For example, the
   format is for all internet-based access methods:
   

      scheme : // host.domain:port / path / path  # anchor
   

    A suffix # anchor id allows one to refer to a particular anchor within a
   document.
   

    A suffix ? followed by words separated by + signs  allows one to seach an
   index (see details ).
   

    References from one document to another with a similar name may be
   abbreviated to a relative name . This imposes certain restrictions on the
   way that the "path" is represented.
   

    A special format is used to represent a search on an index . See also: the
   full BNF description , about escaping illegal characters .
   

Examples


         file://cernvax.cern.ch/usr/lib/WWW/defaut.html#123

   This is a fully qualified file name, referring to a document in the file
   name space of the given internet node, and an imaginary anchor 123 within
   it.
   


         #greg

   This refers to anchor "greg" in the same document as that in which the name
   appears.
   

Naming sub-schemes

   Different schemes usually use different protocols on the network. The format
   of the address after the scheme name is a function of the particular scheme.
   In practice, all internet-based schemes have a common format for the node
   name and port.   Schemes currently defined are as follows, with links to
   more details.
   

  file                    Access is provided to files, using whatever means the
                         browser and/or gateways have to reach files on obscure
                         machines.
                         

  news                    Access is provided to news articles, and newsgroups,
                         normally using the NNTP protocol.
                         

  http                    Access is provided to any other information using the
                         HTTP search and retrieve protocol . The internal
                         addressing of the information system is mapped onto a
                         W3 path.
                         

  telnet                  Access is provided by an interactive telnet session.
                         This is provided ONLY as an interface to other
                         existing online systems which cannot or have not been
                         mapped onto the W3 space.
                         

  gopher                  Access is provided using the "gopher" protocol. The
                         gopher protocol is similar to HTTP but uses separate
                         concepts of menus and text files rather than
                         hypertext.
                         

   Other schemes we foresee are wais and x500.  Systems (such as WAIS) which
   are not currently accessed directly be W3 servers may be accessed though
   gateways, in which case the document address is encoded within the http
   address of the document in the gateway.  Browsers which do not have the
   ability to use certain protocols may (in principle) be configured to
   automaticaly use certain gateways for certain addressing schemes.
   

    This will allow, for example, simple PC-based clients to follow links
   through X500 name servers.
   

                                RELATIVE NAMING
                                       

   The address of a hypertext document is normally given within the context of
   another hypertext document. Where the addresses of the two documents are the
   similar, this allows only the difference between the two names to be given,
   saving space. An example is the address of the destination of a hypertext
   link , which is specified relative to the source document address.
   

    (A futher practical advantage is that a group of documents may be
   transmitted without internal changes, or accessed using more than one
   address.)
   

    In the WWW address format , the rules for relative naming are:
   

       If the "scheme" parts  are different, the whole absolute address must be
          given. Other wise, the scheme is omitted, and:
          

       If the "host" and/or "port" parts are the different, the host name and
          all the rest of the address must be given. The host name may be given
          using internet hostname conventions, ie domains may be omitted where
          different. This is not very well defined:  one tends to assume that
          if any dot is present, then the full domain name is being given, up
          to the root (.) domain, while if there are no dots, the domain is the
          same as that of the hostname part of the the base address.
          

       If the access and host parts are the same, then the path may be given
          with the unix convention, including the use of  ".." to mean indicate
          deletion of a path element. Within the path:
          

       If a leading slash is present, the path is absolute. Otherwise:
          

       The last part of the path of the base address (e.g. the filename of the
          current document) is removed, and the given relative address appended
          in its place.
          

       Within the result,  all occurences "xxx/.."  are recursively removed,
          where xxx is one path element (directory).
          

   The use of the slash "/" and double dot ".." in this case must be respected
   by all servers. If necessary, this may mean converting their local
   representations in order that these characters should not appear within path
   elements (see "escaping").
   

                          ADDRESS FOR AN INDEX SEARCH
                                       

   If a given hypertext node is an index, or the server has an index associated
   with it, then a search may be done on that index by suffixing the name of
   the index with a list of keywords, after a question mark:
   


        address_of_index ? keywordlist

   The address of the index is a normal hypertext address. In the keuwordlist,
   multiple keywords are separated by plus signs (+) .  (See BNF syntax
   description .)  The resulting string still does not contain any spaces. It
   may be considered to be the hypertext address of a document which is the
   result of making the keyword search on the index. Normally, if the search
   was successful, the document returned will contain anchors leading to other
   documents which match the selection criteria.
   

    The search method, and the logical and lexical functions, weights, etc
   applied to the keywords will depend on the index address.  One actual index
   may have several hypertext addresses,  which when searched on will behave in
   different ways. For example, one may allow a search on author-given keywords
   only, while another may be a full text search.  These things particular to
   an index should be descibed in the hypertext page for the index node itself
   (or in linked documents). For example, a server may allow specific boolean
   search combinations may be represented by the words "and", "or" and "not".
   

Example:


                http://cernvm/FIND/?sgml+cms

   indicates the result of perfoming a search for keywords "sgml" and "cms" on
   the index http://cernvm/FIND/.
   

                                HTTP ADDRESSING
                                       

   With an access code of http:,  a protocol introduced for  the WWW initiative
   is used to acquire data from a server. This is the "Hypertext Transfer
   protocol", HTTP , a simple search and retrieve (S and R) protocol.
   

    The syntax of an http address is, with [] indicating optional parts (see
   BNF description ),
   


        http : // hostname [ : port ] / path [ ? searchwords ]

   for example, the following are valid addresses:
   


        http://info.cern.ch/hypertext/WWW/TheProject.html

        http://crnvmc.cern.ch/FIND?sgml+examples

   HTTP addresses conform to the WWW conventions,  including the possibility of
   using the search format . The significance of the items in the path part of
   the document name is completely up to the server. Different paths may be
   used to select different databases, different views of the same database,
   etc.
   

  hostname                This is the name of the server in internet form. A
                         numeric form (e.g. 128.141.201.74) may be used, by the
                         domain name form (e.g. info.cern.ch) is preferred. The
                         hostname is mandatory.
                         

  port                    This is a numeric port number. If a non-numeric
                         string is used, it must be a defined service name.
                         Note that as there is no central repository for
                         service names (they are defined locaaly for each
                         host), a service name is NOT an appropriate way to
                         specify a port number for a hypertext address. If the
                         port number is omitted the preceding colon must also
                         be omitted. In this case, port number 2784 is assumed
                         [This may change!].
                         

  See also: WWW addressing in general , HTTP protocol .
                         

   _________________________________________________________________
   

                                                                         Tim BL
                                                                               

                             W3 ADDRESSES OF FILES
                                       

   The format of a hypertext reference to a file is an extension of the unix
   naming system. The full explicit format is:
   

       file :  //  node /  directories /  name
   

    The actual protocols used by the client depend on the implementation of the
   browser and the environment. Typically, the browser will check to see
   whether the node is the local node,  or a node for which files are available
   mounted in some form of distributed file system.  If neither of these are
   the case, then the browser may try rpc, anonymous FTP or other protocols.
   

Examples


         file://cernvax.cern.ch/usr/lib/WWW/defaut.html

   This is a fully qualified file name.
   


         fred.html

   This relative name , used within a file, will refer to a file of the same
   node and directory as that file, but the name fred.html.
   

Improvements : Directory access

   The final file name should be optional. If the address ends with a '/', the
   browser should retrieve the contents of the specified directory and generate
   a page of virtual hypertext pointing to its contents. In addition, it could
   display an information file contained in that directory, if any is present.
   Suggested file names to search for in order : README.html, *README*.html,
   README, *README*, *readme*.
   

   

   

                        HYPERTEXT ADDRESS FOR NET NEWS
                                       

   The format of a hypertext reference to information in the internet/usenet
   news system can take any of the following forms:
   

  news: newsgroup         This refers to a list of articles currently available
                         in the given newsgroup. The newsgroup is a series of
                         alphanumeric characters and dots.
                         

  news:*                  This refers to a list of valid newsgroups.
                         

  news: message_id        This refers to a given article explicitly. The
                         message_id is optionally surrounded by angle brackets,
                         and must contain an @ sign.
                         

  

                         

   Possible extensions to this are more generous wildcarding for the list of
   newsgroups. It takes too long to load the whole list, and it would be more
   useful to be able to browse through a set of newsgroups.
   

    There is no way of referring to "unread" articles. Keeping track of this is
   the job of the browser.
   

Examples


         news:<12345678@cernvax.cern.ch>

         news:12345678@cernvax.cern.ch

   These addresses both refer to the same (imaginary!) article by its unique
   message-id.
   



news:comp.sys.next.announce

   This refers to a list of articles in the newsgroup comp.sys.next.announce.
   The list is, of course, a list of references to article by message-id.
   

                               TELNET ADDRESSING
                                       

   A telnet address is a spcecial case of a W3 address.
   

    When a telnet address is used, information can only be rertrieved using an
   interactive telnet session. This has the disadvantage that information
   cannot be indexed, searched, etc automatically, nor can it be gatewayed into
   other systems.  The telnet addressing form is used to allow a pointer to
   information systems such as library information systems which have not been
   gatewayed into the web properly yet.
   

    The syntax is, with [] indicating optional parts (see full BNF)
   


        telnet : / /  [ user @ ] host  [ : port ]

   There should be no spaces. For example, the following are valid telnet
   addresses:
   


        telnet://www@info.cern.ch:23

        telnet://www@info.cern.ch

        telnet://info.cern.ch

  user                   is the optional name of the user to be used for login.
                         If the username  is omitted, then so must be the "@"
                         sign. This is equivalent to the argument used with the
                         -l option on the ucb telnet command. When the username
                         is omitted, some access servers will prompt for a
                         username and password.
                         

  host                   This is the name of the server in internet form. A
                         numeric form (e.g. 128.141.201.74) may be used, by the
                         domain name form (e.g.  info.cern.ch) is preferred.
                         The host is mandatory.
                         

  port                   This is a numeric port number. If a non-numeric string
                         is used, it must be a defined service name. Note that
                         as there is no central repository for service names
                         (they are defined locaaly for each host),  a service
                         name is NOT an appropriate way to specify a port
                         number for a hypertext address. If the port number is
                         omitted the preceding colon must also be omitted. In
                         this case, port number 23 is assumed.
                         

   _________________________________________________________________
   

                                                                         Tim BL
                                                                               

                               GOPHER ADDRESSING
                                       

   Gopher addresses indicate that the gopher protocol should be used to access
   the information.  The Gopher protocol is a simple internet protocol similar
   to HTTP . It allows the transfer of menus or plain text files.  (HTTP
   expresses both menus and plain text files as special cases of hypertext
   files). See the gopher protocol notes .
   

    The syntax is, with [] indicating optional parts (see BNF )
   


        gopher:// hostname [: port ] [/gtype/ [selector] ] [ ? search ]

   There should be no spaces. For example, the following are valid addresses:
   


        gopher://gopher.micro.umn.edu:70

        gopher://gopher.micro.umn.edu:70/1/

        gopher://gopher.micro.umn.edu:70

   The W3 address for a gopher item may be derived from the fields of a gopher
   menu line which has the format
   

  host                    This is the name of the server in internet form. A
                         numeric form (e.g. 128.141.201.74) may be used, by the
                         domain name form (e.g. info.cern.ch) is preferred. The
                         hostname is mandatory.
                         

  port                    This is a numeric port number. If a non-numeric
                         string is used, it must be a defined service name.
                         Note that as there is no central repository for
                         service names (they are defined locaaly for each
                         host), a service name is NOT an appropriate way to
                         specify a port number for a hypertext address. If the
                         port number is omitted the preceding colon must also
                         be omitted. In this case, port number 70 is assumed.
                         

  gtype                   This is a gopher item type number, a (hopefully
                         printable!) ASCII character.  Currently these types
                         are all ASCII decimal digit characters. Character "0"
                         (hex 30)  signifies a plain text file. Character "1"
                         signifies a Menu.  Character "7" signifies a
                         searchable index.  Character "8" should not be used in
                         a W3 address: use telnet addressing instead.  In
                         general W3 terms, the type is the first part of the
                         path. The rest of the path is the gopher selector
                         string. The type field is a hint to the client as to
                         how to represent the anchor, and how to follow it.
                         

  selector                This is the string to be sent to the gopher server to
                         identify the information required.
                         

   _________________________________________________________________
   

                                                                         Tim BL
                                                                               

                          ESCAPING ILLEGAL CHARACTERS
                                       

   The W3 address syntax allows a path to contain most printable ASCII
   characters, but some are inevitably used for punctuation are excluded. W3
   addresses are sometimes used to represent addresses in some other space.
   This happens when an HTTP server, for example, uses file names as its
   document names, or when addresses from some other protocol (Gopher, WAIS,
   etc) are mapped into the W3 web.
   

    In these cases, a convention is normally used to map illegal characters in
   these "foreign" names onto the allowed set.
   

    In the case of an HTTP server,  any mapping may be used.
   

    A suitable convention is that a percent sign (%) followed by two
   hexadecimal digits (0-9 or a-f)  stands for the single character with ASCII
   hexadecimal code represented by those two digits (Most significant digit
   first).
   

    A percent sign itself must therefore be represented by %25, as 25 hex is
   the ASCII code for "%".
   

    _________________________________________________________________
   

                                                                         Tim BL
                                                                               

                            W3 ADDRESS SYNTAX: BNF
                                       

   This is a BNF-like description of the W3 addressing syntax . We use a
   vertical line "|" to indicate alternatives, and [brackets] to indicate
   optional parts.   Spaces are representational only: no spaces are actually
   allowed within a W3 address. Single letters stand for single letters. All
   words of more than one letter below are entites described elsewhere in the
   syntax description.  (Entity names are here linked to their definitions,
   probably making this unreadable with the line mode browser.)
   

    An absolute address specified in a link is an anchoraddress . The address
   which is passed to a server is a docaddress .
   

  anchoraddress           docaddress [ # anchor ]
                         

  docaddress              httpaddress | fileaddress | newsaddress |
                         telnetaddress | gopheraddress
                         

  httpaddress             h t t p :   / / hostport  [  / path ] [ ? search ]
                         

  fileaddress             f i l e : / / host / path
                         

  newsaddress            n e w s : groupart
                         

  groupart               * | group | article
                         

  group                  ialpha [ . group ]
                         

  article                xalphas @ host
                         

  telnetaddress           t e l n e t : / / [ user @ ] hostport
                         

  gopheraddress           g o p h e r : / / hostport  [/ gtype  [ / selector ]
                         ] [ ? search ]
                         

  hostport                host [ : port ]
                         

  host                    hostname | hostnumber
                         

  hostname                ialpha [  .  hostname ]
                         

  hostnumber              digits . digits . digits . digits
                         

  port                    digits
                         

  selector                path
                         

  path                    void |  xalphas  [  / path ]
                         

  search                  xalphas [ + search ]
                         

  user                    xalphas
                         

  anchor                  xalphas
                         

  gtype                   xalpha
                         

  xalpha                  alpha | $ | _ | @ | ! | % | ^ | | * |  (  |  ) | . |
                         digit
                         

  xalphas                 xalpha [ xalphas ]
                         

  ialpha                 alpha [ xalphas ]
                         

  alpha                   a | b | c | d | e | f | g | h | i | j | k | l | m | n
                         | o | p | q | r | s | t | u | v | w | x | y | z | A |
                         B | C | D | E | F | G | H | I | J | K | L | M | N  | O
                         | P | Q | R | S | T | U | V | W | X | Y | Z
                         

  digit                   0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
                         

  digits                  digit [ digits ]
                         

  alphanum                alpha | digit
                         

  alphanums               alphanum [ alphanums ]
                         

  void
                         

  See also: General description of this syntax, Escaping conventions.
                         

   _________________________________________________________________
   

                                                                         Tim BL
                                                                               

                                     HTML
                                       

   The WWW system uses marked-up text to represent a hypertext document for
   transmision over the network. The hypertext mark-up language is an SGML
   format. This defines the basic syntax used. The particular language, the set
   of tags and the rules about their use, and their significance is not part of
   the SGML standard. There being no standard on this, we have adopted a set
   which seems sensible. We call them HTML -- hypertext markup language. HTML
   is not an alternative to SGML, it is a particular format within the SGML
   rules (an SGML "DTD"). HTML parsers should ignore tags which they do not
   understand, and ignore attributes which they do not understand of tags which
   they do understand.
   

    See also:
   

  The tags                A list of the tags used in HTML with their
                         significance.
                         

  Example                 A file containing a variety of tags used for test
                         purposes.
                         

Default text

   Unless otherwise defined by tags, text is transmitted as a stream of lines.
   The division of the stream of characters into lines is arbitrary, and only
   made in order to allow the text to be passed through systems which can only
   handle text with a limited line length. The recommended line length for
   transmission is 80 characters. The division into lines has no significance
   (except in the case of  example sections and PLAINTEXT ) apart from
   indicating a word end. Line breaks between tags have no significance.
   

                                   HTML TAGS
                                       

   This is a list of tags used in the HTML language.  Each tag starts with a
   tag opener (a less than sign) and ends with a tag closer (a greater than
   sign).   Many tags have corresponding closing tags which identical except
   for a slash after the tag opener. (For example, the TITLE tag).
   

    Some tags take parameters, called attributes. The attributes are given
   after the tag, separated by spaces. Certain attributes have an effect simply
   by their presence, others are followed by an equals sign and a value. (See
   the Anchor tag, for example). The names of tags and attributes are not case
   sensitive: they may be in lower, upper, or mixed case with exactly the same
   meaning.  (In this document they are generally represented in upper case.)
   

    Currently HTML documents are transmitted without the normal SGML framing
   tags, but if these are included parsers will ignore them.
   

Title

   The title of a document is given between title tags:
   


        <TITLE> ... </TITLE>

   The text between the opening and the closing tags is a title for the
   hypertext node. There should only be one title in any node. It should
   identify the content of the node in a fairly wide context, and should
   ideally fit on one line.
   

    The title is not strictly part of the text of the document, but is an
   attribute of the node. It may not contain anchors, paragraph marks, or
   highlighting. the title may be used to identify the node in a history list,
   to label the window displaying the node, etc. It is not normally displayed
   in the text of a document itself. Contrast titles with headings .
   

Next ID

   This tag takes a  single attribute which is the number of the next
   document-wide numeric identifier to be allocated (not good SGML). Note that
   when modifying a document,  old anchor ids should not be reused, as there
   may be references stored elsewhere which point to them.  This is read and
   generated by hypertext editors. Human writers of HTML usually use mnemonic
   alpha identifiers.  Browser software may ignore this tag. Example of use:
   


        <NEXTID 27>

Base Address

   Anchors specify addresses of other documents, in a from relative to the
   address of the current document. Normally, the address of a document is
   known to the browser because it was used to access the document. However, is
   a document is mailed, or is somehow visible with more than one address (for
   example, via its filename and also via its library name server catalogue
   number), then the browser needs to know the base address in order to
   correctly deduce external document addresses.
   

    The format of this tag is not yet specified.
   

Anchors

   The format of an anchor is as follows:
   


        <A NAME=xxx HREF=XXX> ... </A>

   The text between the opening tag and the closing tag is either the start or
   destination (or both) of a link. Attributes of the anchor tag are as
   follows.
   

  HREF                    If the HREF attribute is present, the anchor is
                         senstive text: the start of a link. If the reader
                         selects this text,  he should be presented with
                         another document whose network address is defined by
                         the value of the HREF attribute . The format of the
                         network address is specified elsewhere . This allows
                         for the form HREF=#identifier to refer to another
                         anchor in the same document. If the anchor is in
                         another document, the atribute is a relative name ,
                         relative to the documents address (or specified base
                         address if any).
                         

  NAME                    The attribute NAME allows the anchor to be the
                         destination of a link. The value of the parameter is
                         that part of a hypertext address which follows the
                         hash sign.
                         

  TYPE                    An attribute TYPE may give the relationship described
                         by the hyertext link. The type is expressed by a
                         string for extensibility.  Strings for types with
                         particular semantics will be registered by the W3
                         team. The default relationship if none other is given
                         is void.
                         

   All attributes are optional, although one of NAME and HREF is necessary for
   the anchor to be useful.
   

IsIndex

   This tag informs the reader that the document is an index document. As well
   as reading it, the reader may use a keyword search.
   

    Format:
   


        <ISINDEX>

   The node may be queried with a keyword search by suffixing the node address
   with a question mark, followed by a list of keywords separated by plus
   signs. See the network address format.
   

Plaintext

   This tag indicates that all following text is to be taken litterally, up to
   the end of the file.  Plain text is designed to be represented in the same
   way as example XMP text, with fixed width character and significant line
   breaks. Format:
   


                <PLAINTEXT>

   This tag allows the rest of a file to be read efficiently without parsing.
   Its presence is an optimisation. There is no closing tag.
   

Example sections

   These styles allow text of fixed-width characters to be embedded absolutely
   as is into the document. The format is:
   


        <LISTING>

                ...

        </LISTING>

   The text between these tags is to be portrayed in a fixed width font, so
   that any formatting done by character spacing on successive lines will be
   maintained. Between the opening and closing tags:
   

       The text may contain any ISO Latin printable characters, including the
          tag opener, so long as it does not contain the closing tag in full.
          

       Line boundaries are significant, and are to be interpreted as a move to
          the start of a new line.
          

       The ASCII Horizontal Tab (HT) character should be interpreted as the
          smallest positive nonzero number of spaces which will leave the
          number of characters so far on the line as a multiple of 8. Its use
          is not recommended however.
          

   The LISTING tag is portrayed so that at least 132 characters will fit on a
   line.  The XMP tag is portrayed in a font so that at least 80 characters
   will fit on a line but is otherwise identical to LISTING. The examples of
   markup are here given using the XMP tag.
   

Paragraph

   This tag indicates a new paragraph. The exact representation of this
   (indentation,  leading, etc) is not defined here, and may be a function of
   other tags, style sheets etc. The format is simply
   


        <P>

   (In SGML terms, paragraph elements are transmitted in minimised form).
   

Headings

   Several levels (at least six) of heading are supported. Note that a
   hypertext document tends to need less levels of  heading than a normal
   document whose only structure is given by the nesting of headings. H1 is the
   highest level of heading, and is recommened for the start of a hypertext
   node.   It is suggested that the first heading be one suitable for a reader
   who is already browsing in related information, in contrast to the title tag
   which should identify the node in a wider context.
   


        <H1>, <H2>, <H3>, <H4>, <H5>, <H6>

   These tags are kept as defined in the CERN SGML guide. Their definition is
   completely historical, deriving from the AAP tag set.  A difference is that
   HTML documents allow headings to be terminated by  closing tags:
   


        <H2>Second level heading</h2>

Highlighting

   The highlighted phrase tags may occur in normal text, and may be nested. For
   each opening tag there must follow a corresponding closing tag. NOT
   CURRENTLY USED.
   



        <HP1>...</HP1>   <HP2>... </HP2> etc.

Glossaries


   A glosary (or definition list) is a list of paragraphs each of which has a
   short title alongside it.  Apart from glossaries, this format is useful for
   presenting a set of named elements to the reader. The format is as follows:
   



        <DL>

        <DT>Term<DD>definition pagagraph

        <DT>Term2<DD>Definition of term2

        </DL>

Lists


   A list is a sequence of paragraphs, each of which is preceded by a special
   mark or sequence number. The format is:
   



        <UL>

        <LI> list element

        <LI> another list element ...

        </LI>

   The opening list tag (UL for an unordered list, OL for an ordered one) must
   be immediately followed by the first list element. The representation of the
   list is not defined here, but a bulleted list for unordered lists,  and a
   sequence of numbered paraghraphs for an ordered list would be quite
   appropriate.
   

    "OL" IS NOT CURRENTLY USED



From timbl  Fri Jan 24 16:07:49 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA12579; Fri, 24 Jan 92 16:07:49 GMT+0100
Date: Fri, 24 Jan 92 16:07:49 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9201241507.AA12579@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: www-interest@nxoc01.cern.ch, www-talk@nxoc01.cern.ch
Subject: WorldWideWeb news: New software includes Gopher, News, Telnet access


           World Wide Web:          WHAT'S NEW IN '92
                                       

   Here's the latest (that we know) about W3, the hypertext information
   system. The High-Energy Physics world got
   its first official announcement of W3 in the CERN computer newsletter
   released at Christmas, with an introductory article. However, there are
   already many users of W3 outside HEP!
   

New browser

   The new year starts with a release (version 1.1 - our first official
   "version1" release) of the line mode browser.  This has protocol code in for
   a wealth of new information, with:
   

       o  Direct access to internet news groups
          

       o  Direct access to " gopher " campus-wide information systems etc.
          (Gopher  is system similar to W3 but using a web of menus and
	  plain text files rather than hypertext.  It is all readable as
	  hypertext using W3)
          

       o  Browsing of remote directories using FTP. Before, files could be
          read - now you can browse around as well. Any FTP site becomes a W3
          information source.
          

       o  Links directly to telnet (and rlogin) sites.  This allows
          hypertexts to point to online communications facilities which don't
          have servers.
          

       o  Extensibility using gateways - you can configure www to use specific
          gateways for any access protocol which might turn up in the future
          which it can't handle directly.
          

   The user interface is slightly improved,  and  you can save a document to a
   file, pipe it, or print it (under unix).
   

   The browser version can be picked up by anonymous FTP in the usual way
   including source binaries for several platforms.
   

   Those who have built other hypertext systems (such as Hyperbole and Viola)
   on top of the www browser will immediately gain access to the all this newly
   accessible information.
   

W3 at SLAC

   Hot on the heels of the announcement of the W3 server for the "SPIRES"
   High-Energy Physics preprint database at the Stanford Linear ACcelerator lab
   comes news from Paul Kunz that the line mode browser is installed on all
   unix systems at SLAC. Happy browsing, folks.
   

Browsing on VM/CMS

   The IBM mainframe at CERN now has a copy of the w3 browser (v0.14) running
   in line mode. We are considering ways to make it more full-screen in the VM
   style.  The v1.1 browser is under test and may be installed by the time
   you get this. Other VM sites mail me for details.
   

Browsing under X with Viola

   A version of www running in the "Viola" hypertext system looks neat - I just
   saw it running on an apollo and on a decstation. We shall release it soon
   with the coming new version of viola.
   

Conferences

   The W3 demonstration had an enthusiastic reception at " HyperText'91 " in
   San Antonio, Texas.
   

   Jean-Francois Groff presented W3 live at the Software Engineering
   and AI for High-Energy Physics workshop in La Londe, France.  We've also
   been asked to demonstrate, as well as present a paper, at the Joint European
   Networking Conference (JENC3) in Innsbruck, Austria, in May.

As usual, details by telnetting info.cern.ch (no username or password),
then selecting the "WorldWideWeb" link, software by anonymous FTP
from node info.cern.ch.
__________________________________________________________
Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web project                (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155







From timbl  Fri Jan 24 17:12:46 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA12783; Fri, 24 Jan 92 17:12:46 GMT+0100
Date: Fri, 24 Jan 92 17:12:46 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9201241612.AA12783@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: wei@xcf.berkeley.edu (Pei Y. Wei)
Subject: Viola - WWW interface
Cc: www-talk@nxoc01.cern.ch

Pei,

ViolaWWW works great!  It has impressed us here. Its fine on decstation and apollo  
displays (The crashing HP display was an HP X server problem.)   A strange thing is  
that it seems to be so fast -- a search in the CERN phone book seems instantaneous
(when done at CERN).  Perhaps the line mode browser's critical path is the
time taken for the terminal emulator to display the characters, and Viola is  
faster.

We're going to have to standardize (have a competetion for) the WWW icon.
I like the web, though some may be arachnophobic. We use a globe on the NeXT.
I wondered about some combination of an open book and a globe...

The viola changes aren't in www v1.1, but are in the master code for 1.2
(or 1.1a) already.

Now for details about the marking of fields for viola:

>To answer your questions:
>
>> Do I correctly assume that anything which is displayed on the screen  

>> surrounded by SI..SO characters can be clicked on and will then be  

>> sent to www?
>
>That's correct, for the old version of Viola. This scheme, however, has
>changed in the new version... (see below).
>
> Here are two important things Viola need to interface to www. Both might
> be effected by -viola:
>
>1) A consistant and hopefully unique prompt ending. It should always look
>the same ("::: " or whatever is predictable).

    Are you sure it isn't safer to use a non-printable character? That would
    be guaranted unique.

> 2) Instead of the old (and limiting) SI..SO scheme, the new scheme is to use
> the \h()\e() combination thusly:
>
>	See also CERN copyright[\h(2)\e(2)].
>
> The text within \h(...) is displayed highlited, and text within \e(...) 

> is ``embedded'' into the text field for retrieval (when clicked) by Viola
> scripts.

To separate the anchor text (highlished bit) and the meaning is a good idea.
In that case we would highlight the whole phrase: in the underlying mark-up,
the beginning of the anchor is marked as well as the end, although that is not
apparent with the line-mode browser normally. Using your example, this would come  
out like

	See also \h(CERN copyright)\e(2).

without any [numbers] displayed at all.

Again, the use of printable brackets () is a bit dangerous, as in fact
there will be cases in which unmatche dbrackets appear within the highlighted bit.


-------------


> I'm very interested in producing a usable WWW front-end (the current
> ViolaWWW was a one-nite hack to prove feasibility). I am also thinking
> of incorporating the HHTP code into Viola. Thou I have to look at the 

> code more...
>

If you want to look at the code, see whether your text widget can implement the
"Object Building" methods in HText.h.  These are the calls made by the www
communications code and parsers to build the hypertext.  If you can provide those,
it should be easy.  You just call one of the routines in HTAccess.h to
load a document by name, and it calls you back to create and build the document.


>> More qustions:
>> Did you need any mods to viola, or is everything you needed put into  

>> the "ht" stack?  In other words, was the viola directory a stnadrd  

>> release which we can replace with new versions when they come out?
>
> No modification was made to Viola in order to front-end the modified
> www.

Great stuff.


> [...] No problem! Give me a few days, and I may have the new ViolaWWW ready. 

> It will have a lot of improvements such as multiple fonts and geometry
> management.

If you really do multiple fonts, that would be great!  In that case
you will probably want to combine the www access code with viola itself
in order to get the style changes as a hypertext document is parsed.
That would be really neat.  It would remove all this marking of fields with
funny characters, as well.

>>      Tim 

>   -Pei
  Tim


From dkrieger%monty@rand.org  Fri Jan 24 19:09:34 1992
Return-Path: <dkrieger%monty@rand.org>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA13051; Fri, 24 Jan 92 19:09:34 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA19386; Fri, 24 Jan 92 19:03:37 +0100
Received: from monty.rand.org by rand.org; Fri, 24 Jan 92 09:48:06 -0800
Received: from thoreau by monty; Fri, 24 Jan 92 09:48:03 PST
Received: from localhost by thoreau (4.1/SMI-4.1)
	id AA12826; Fri, 24 Jan 92 09:48:02 PST
Message-Id: <9201241748.AA12826@thoreau>
To: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Cc: www-interest@nxoc01.cern.ch, www-talk@nxoc01.cern.ch
Subject: Re: WorldWideWeb news: New software includes Gopher, News, Telnet access 
In-Reply-To: Your message of Fri, 24 Jan 92 16:07:49 N.
             <9201241507.AA12579@ nxoc01.cern.ch > 
Date: Fri, 24 Jan 92 09:48:01 PST
From: David Krieger <dkrieger%monty@rand.org>

WHOEVER IS IN CHARGE OF THESE MAILING LISTS -- PLEASE REMOVE ME.
		dkrieger%monty@rand.org


From timbl  Tue Feb  4 08:44:25 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA24645; Tue, 4 Feb 92 08:44:25 GMT+0100
Date: Tue, 4 Feb 92 08:44:25 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202040744.AA24645@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: emv@cic.net
Subject: Re: using WWW to follow gopher links 
Cc: www-talk@nxoc01.cern.ch, gopher@boombox.micro.umn.edu,
        wais-talk@quake.think.com

Ed,

All good stuff -- the world is coming together.

What do you think is the most useful www option for tracing what's out there?
I have two suggestions - one is a -list option (or something) which makes
www return only list of related documents, one on each line.
Another is one which will recursively run down a tree. The
trouble with the latter is telling it where to stop. Depth isn't really good enough
as probably you also want to constrain it to only gopher files, for example.
Perhaps the most flexible would be just the first option, with a perl etc script  
around ir to be flexible. I'd link to see for example lists of all telnet sites
references by gopher or www links, a wais server for www documents and gopher  
nodes.  My guess is that one index could handle the lot so long as one trimmed
off the few places where people have gatewayed in the entire ftp world, etc.
Then I'd like to see a www server for that index so that one could jump straight to  
the docoument wherever it came from.... I have to write an articel today, maybe
tomorrow I'll put in www -list.

KUTGW
	Tim

[PS: I assume you meant -p rather than -np in the www command. Perhaps we
should put in -np if it is more intuitive than -p for no paging.
I'll look at the CR problem.]

__________________original message follows
Tim,

Some more results of wais/www/gopher collaboration.

I have a new WAIS server running at wais.cic.net, called
"midwest-weather".  It's fed by loading in a bunch of weather reports
from a gopher at Minnesota every hour.  That system gets them from the
"weather underground" at Michigan using some hairy expect scripts, I
figured it'd be easier to get things out of gopher instead.

The script looks like:

WEATHER=gopher://mermaid.micro.umn.edu:150/00/Weather
www -n -np ${WEATHER}/Indiana/Fort%20Wayne | sed -e 's/.$//' > fort-wayne.in
www -n -np ${WEATHER}/Indiana/Indianapolis | sed -e 's/.$//' > indianapolis.in
www -n -np ${WEATHER}/Indiana/South%20Bend | sed -e 's/.$//' > south-bend.in
[...]

For some reason the gopher files are coming out of www with extra ^M's
on the end, as if they were DOS files; so the sed thing gets rid of them.

I don't see a way to do this with just one invocation of www, so
instead it runs once for each file.

Neither gopher nor WWW have the notion of a "recursive directory
listing", either some complete overview of the structure of the system
or some skeleton outline.  (I realize it's arbitrarily hard to do so
since any link could point off anywhere else.)  That makes it tougher
to do an archie-style catalog.  I think it wouldn't be that hard to
build a tree-walker for gopher that prints out a list of the
directories on every system that it can find and also the text of all
of the stuff that's in the ".about" directories.  At the very least
I'm doing some of that by hand now (just a script like the one above)
& waising it so I have some clue what all is out there.  *not* a 

replacement for the per-site indexes, but a cross-section.

--Ed


From jfg@bernd.cern.ch  Tue Feb  4 10:14:38 1992
Return-Path: <jfg@bernd.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA24750; Tue, 4 Feb 92 10:14:38 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA08874; Tue, 4 Feb 92 10:08:04 +0100
Received: by bernd.cern.ch (AIX 3.1/UCB 5.61/4.03)
          id AA12890; Tue, 4 Feb 92 10:07:25 -2300
Date: Tue, 4 Feb 92 10:07:25 -2300
From: jfg@bernd.cern.ch (Jean-Francois Groff)
Message-Id: <9202050907.AA12890@bernd.cern.ch>
To: www-talk@nxoc01.cern.ch
Subject: forwarded message from emv@cic.net

------- Start of forwarded message -------
Received: from cernvax.cern.ch by bernd.cern.ch (AIX 3.1/UCB 5.61/4.03)
          id AA13075; Tue, 4 Feb 92 00:59:29 -2300
Received: by cernvax.cern.ch (5.57/Ultrix2.0-B)
	id AA02837; Tue, 4 Feb 92 00:59:58 +0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA24681; Tue, 4 Feb 92 00:59:56 +0100
Received: from nic.cic.net by quake.think.com (4.1/SMI-4.0)
	id AA02414; Mon, 3 Feb 92 15:42:59 PST
Received: by nic.cic.net (4.1/SMI-4.1)
	id AA09725; Mon, 3 Feb 92 18:41:34 EST
Message-Id: <9202032341.AA09725@nic.cic.net>
In-Reply-To: Your message of "Mon, 20 Jan 92 10:19:51 +0100."
             <9201200919.AA05201@ nxoc01.cern.ch > 
From: emv@cic.net
To: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Cc: jfg@cernvax.cern.ch, gopher@boombox.micro.umn.edu,
        wais-talk@quake.think.com
Subject: Re: using WWW to follow gopher links 
Date: Mon, 03 Feb 92 18:41:32 -0500

Tim,

Some more results of wais/www/gopher collaboration.

I have a new WAIS server running at wais.cic.net, called
"midwest-weather".  It's fed by loading in a bunch of weather reports
from a gopher at Minnesota every hour.  That system gets them from the
"weather underground" at Michigan using some hairy expect scripts, I
figured it'd be easier to get things out of gopher instead.

The script looks like:

WEATHER=gopher://mermaid.micro.umn.edu:150/00/Weather
www -n -np ${WEATHER}/Indiana/Fort%20Wayne | sed -e 's/.$//' > fort-wayne.in
www -n -np ${WEATHER}/Indiana/Indianapolis | sed -e 's/.$//' > indianapolis.in
www -n -np ${WEATHER}/Indiana/South%20Bend | sed -e 's/.$//' > south-bend.in
[...]

For some reason the gopher files are coming out of www with extra ^M's
on the end, as if they were DOS files; so the sed thing gets rid of them.

I don't see a way to do this with just one invocation of www, so
instead it runs once for each file.

Neither gopher nor WWW have the notion of a "recursive directory
listing", either some complete overview of the structure of the system
or some skeleton outline.  (I realize it's arbitrarily hard to do so
since any link could point off anywhere else.)  That makes it tougher
to do an archie-style catalog.  I think it wouldn't be that hard to
build a tree-walker for gopher that prints out a list of the
directories on every system that it can find and also the text of all
of the stuff that's in the ".about" directories.  At the very least
I'm doing some of that by hand now (just a script like the one above)
& waising it so I have some clue what all is out there.  *not* a 
replacement for the per-site indexes, but a cross-section.

- --Ed



------- End of forwarded message -------

From ojala@dolphin.funet.fi  Tue Feb  4 20:29:53 1992
Return-Path: <ojala@dolphin.funet.fi>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA25924; Tue, 4 Feb 92 20:29:53 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA02208; Tue, 4 Feb 92 20:23:20 +0100
Received: by dolphin.funet.fi id AA12182
  (5.65c/IDA-1.4.3 for www-talk@nxoc01.cern.ch); Tue, 4 Feb 1992 21:23:28 +0200
Date: Tue, 4 Feb 1992 21:23:28 +0200
Message-Id: <199202041923.AA12182@dolphin.funet.fi>
From: Petri Ojala <ojala@funet.fi>
Sender: ojala@dolphin.funet.fi
To: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Cc: www-talk@nxoc01.cern.ch
In-Reply-To: <9202040744.AA24645@ nxoc01.cern.ch >
Subject: Re: using WWW to follow gopher links 


I would prefer the latter suggestion plus some additions.  For example in
X environment it would be nice to be able to get a overview of the
"hypertext network" around the current location.  A sort of scan to the
next level following all the links at the current document.  (Maybe
with possibility see more than one level with the client keeping book
of not following one document more than once.)

The other feature would be fully recursive list of documents (or
references).  However this list would not follow all the links but
only those marked to be followed in the anchor specification.  As this
is quite dangerous feature it should be avoided but still possible
for example in Gopher-environments.  Of course the client could in
any case keep book of the documents shown and not follow same page
more than once.

Petri

From timbl  Fri Feb  7 16:40:18 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA00647; Fri, 7 Feb 92 16:40:18 GMT+0100
Date: Fri, 7 Feb 92 16:40:18 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202071540.AA00647@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: www-talk@nxoc01.cern.ch
Subject: W3 server at NIKHEF;  Scripst as w3 servers


From W.vanLeeuwen@nikhef.nl (Willem van Leeuwen):

I installed the server on our recently installed network sun (nic.nikhef.nl).
With a very crude change in HTRetrieve it is possible to activate
a script in case of a keyword search:

/*		Handle a Retrieve request from a WWW client	HTRetrieve.c
...
extern int HTWriteASCII(int soc, char * s);	/* In HTDaemon.c */

	char *command;
	char string[80];
	char *ip;

/*		Read a file
...
    if (keywords) {
/*
	if (TRACE) fprintf(logfile,"HTHandle: can't perform search %s\n",
		arg);
	HTWriteASCII(soc,
	    "Sorry, this server does not perform searches.\n");
*/
	ip = string;
	command = "/user/a03/bin/WWWsh ";
	for (;*command!='\0';) *ip++ = *command++;
	for (;*arg!='\0';) *ip++ = *arg++;
	*ip++ = '?';
	for (;*keywords!='\0';) *ip++ = *keywords++;
	system (string);
	return fd;
    }
    

    StrAllocCopy(arg2, arg);
...

With the following script the finger information is sent to the client:

	name=`echo $@ | awk -F? '{print $2}`
	echo '<plaintext>'
	/usr/ucb/finger $name@nikhefh

This script only serves to demonstrate that this solution works.

Our default file is //nic.nikhef.nl./user/a03/www/default/default.html
but we are still experimenting so it is too early to hook nikhef into WWW.
I'll let you know when we have more useful information.

Best regards,
Willem van Leeuwen


From timbl  Fri Feb  7 17:58:24 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA00861; Fri, 7 Feb 92 17:58:24 GMT+0100
Date: Fri, 7 Feb 92 17:58:24 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202071658.AA00861@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: emv@cic.net
Subject: Re: gopher can read www links right now!
Cc: www-talk



Begin forwarded message:

To: timbl@nxoc01.cern.ch
Subject: gopher can read www links right now!
Date: Fri, 07 Feb 92 09:15:54 -0500
From: emv@cic.net

> ho ho ho!  take a look at this:
> 

> www gopher://info.cern.ch:2784//GET%20/hypertext/WWW/FileFormat.html
> 

> accidental compatibility...

Ha!  There's a man who knows what's going on.... of course the slash just before
the GET is interpreted as a gopher type character which happens to be
invalid, so www just reads the document as plain text. With versiuon 1.1c or later  
(no, it's not released yet but I will if you want it), an "h" field means "html  
format". So I can say:(spot of the difference)

www gopher://info.cern.ch:2784/hGET%20/hypertext/WWW/FileFormat.html

File format

   The system uses marked-up text to represent a hypertext document when one is
   being stored in a file or transmitted over the network. Some of the formats
   available are illustrated in a test hypertext[1]. The hypertext mark-up
   language is an SGML format. This means basically that it uses angle brackets
   to delimit language constructs embedded within the text. The particular
   language 1 the set of tags and the rules about their use, and their
   significance 1 is not part of the SGML standard. There being no standard on
   this, we have adopted a set which seems sensible. Let's call them HTML --
   hypertext markup language. HTML is not an alternative to SGML, it is a
   particular format within the SGML rules (an SGML "DTD"). We have included in
   HTML  tags from the SGML tagset used at and once supported at CERN by  quite
   a lot of documentation and SGML examples.[2] The HTML parser will ignore
   tags which it does not understand, and will ignore attributes which it does
   not understand of CERN-SGML tags.
     [End]


Basically, Gopher addresses and w3 addresses are fairly interconvertable. And you  
are right, http and gopher protocols are very similar.  [The "w" field can only be  
used in a gopher menu. It means "The selector string is in fact a w3 address, don't  
expect a port number or host address to follow it".]

From timbl  Mon Feb 17 17:32:58 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA18163; Mon, 17 Feb 92 17:32:58 GMT+0100
Date: Mon, 17 Feb 92 17:32:58 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202171632.AA18163@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Eelco van Asperen <evas@cs.few.eur.nl>
Subject: Re: WWW LineMode browser
Cc: www-talk

> From: Eelco van Asperen <evas@cs.few.eur.nl>
>
> I've been working on a curses-version of the line mode browser
> and I think the code works reasonably well. The modifications
> I made to v1.1 have now been incorporated into v1.2. Most
> changes are in the form of #ifdef CURSES ... #endif.
> I've also added lots of checks where malloc ea. are used to
> make sure that they return a non-NULL pointer.

Great to have a curses full-screen browser!  Something we has always throught ought  
to be done, but hadn't had the time.  Send me the patches, and I'll insert them  
into our current code and into the next release.

> Unfortunately, the PC version keeps crashing lately; I'm not sure
> if the problem is the PC-NFS toolkit or the browser code.
> The program hangs when one of the PC-NFS toolkit routines tries 

> to free a memory block; this could be caused by a corrupted memory
> chain. This in turn can be caused by freeing some pointer that
> was not malloc'ed.  The code runs ok under Unix even when I add
> the malloc-debugging library.

Sounds like a PC-sepcific bug there... but you never know. We have found something  
weird with realloc() under aix but that looks like an aix bug.

> Some other points;
> - a lot of files (especially *.html and *.txt files) don't end
>   with a linefeed; the SCCS source code control system does not
>  like this. Perhaps this can be fixed in future releases ?

I think this may be Edit on the NeXT which is happy to leave an uncompleted line on  
the end of a file.  I'll have to look at it.

> - could you make context diffs from the last to the current
>  release ? This would make it easier to see what has changed
>  and merge those changes with the ones I've made to my copy
>  of the last release.

Assuming your patches are useful -- which they certainly have been up till now,  
we'll incorporate them into the release.

> So, if you are interested in my patches, please let me now and I'll
> send them.

	yes please!


__________________________________________________________
Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web initiative             (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155








From timbl  Tue Feb 18 11:30:44 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA19728; Tue, 18 Feb 92 11:30:44 GMT+0100
Date: Tue, 18 Feb 92 11:30:44 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202181030.AA19728@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: "Timo Harmo - SocSci U of Helsinki" <HARMO@valt.helsinki.fi>
Subject: W3 and other systems
Cc: www-talk@nxoc01.cern.ch

> From: "Timo Harmo - SocSci U of Helsinki"  <HARMO@valt.helsinki.fi>

>> In the end, there will be hypertext, certainly.
> Yes, I agree (I hate hierarchical information systems). But meanwhile
> gopher (wais I don't know about) offers working systems for many
> platforms, doesn't it?
>
> Let's continue on the list.

Ok, a discussion of the proliferation of protocols is a thing suitable for the
list.  Timo had been saying - why not broadcast about W3 on the WAIS and Gopher
lists/groups? Well, every now and again I mention it, but I do not want to misuse
the groups.

Yes, gopher offers working systems for many platforms.  The gopher protocol
is VERY like the basic HTTP protocol.  We feel that the W3 model is more general.
Its very easy to write gopher and w3 servers, but more difficult to write w3  
browsers. The trouble is, it takes longer to write a hypertext browser because the  
hypertext widgets won't exist. The gopher people are doing really good work by  
getting information out there.  This gives everyone experience. We run the w3 web  
with the gopher web as a subset. This gives a lot of data. It means w3-based search  
engines and indexers can include gopher data.

The disappointing thing of course is that a lot of information is better presented
as hypertext. If you like, a gopher menu page is ulrra-simple hypertext. Real  
hypertext, with the little formatting you get in HTML, is more powerful, and leads  
to better communication between the information provider and the reader. And  
communication is what we are talking about.

Actually, the HTTP protocol is not te most important thing for people to use. There  
will always be many S&R protcols. The W3 addressing syntax is much more important.

>> - Timo
	- Tim


From timbl  Tue Feb 18 14:25:04 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA19967; Tue, 18 Feb 92 14:25:04 GMT+0100
Date: Tue, 18 Feb 92 14:25:04 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202181325.AA19967@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: kean@talon.ucs.orst.edu
Subject: Re: archie wais server - WWW access directly to files
Cc: wais-talk@think.com, www-talk@nxoc01.cern.ch


> If there's no improvement to the disk space used by these I'll announce the
> server to the world Wednesday morning in alt.wais and the archie-people mailing
> list.

Magic -- I really like it.  I have modified the WAIS-WWW gateway so that is the  
database is an archie one (i.e.database name contains "archie" :-| ) a W3 search of  
the database will give links directly to the files.  This relies on the headline  
starting with host:/filename.

For the first test I ran www, followed a pointer to your archie server, looked for
"html" to find any hypertext files out there. At the bottom of the list is a file  
of Ed Vielmetti's "archie.html" in Toronto.  Jumped over there.. it has a pointer  
to the comp.archives newgroup ... jumped into the newsgroup... jumped into an  
article...  This is how things SHOULD work!

Now the things which need neatening up. The gateway I was using is here in CERN,  
and the server is at ordst.edu. This implies a transatlantic link for every search.

- Would you or anyone like to run the gateway locally? Volunteers?

- Perhaps we should run the WAIS archie here, mirroring files.

- One of us could run a direct W3 archie server rather than using a gateway
     process.

To use www as an archie client,

	setenv  WWW_wais_GATE  http://info.cern.ch:8001/

(or point to a more local gateway!) and 


	alias archie  www wais://archive.orst.edu:9000/archie-orst.edu

I've put a pointer to the archie index in our CERN home page, but not publicised it  
yet.

Tim BL

From brewster@quake.think.com  Tue Feb 18 15:05:19 1992
Return-Path: <brewster@quake.think.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20087; Tue, 18 Feb 92 15:05:19 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA20506; Tue, 18 Feb 92 14:59:37 +0100
Received: by quake.think.com (4.1/SMI-4.0)
	id AA28643; Tue, 18 Feb 92 05:58:16 PST
Date: Tue, 18 Feb 92 05:58:16 PST
Message-Id: <9202181358.AA28643@quake.think.com>
From: Brewster Kahle <brewster@think.com>
Sender: brewster@quake.think.com
To: timbl@nxoc01.cern.ch
Cc: kean@talon.ucs.orst.edu, wais-talk@think.com, www-talk@nxoc01.cern.ch
In-Reply-To: Tim Berners-Lee's message of Tue, 18 Feb 92 14:25:04 GMT+0100 <9202181325.AA19967@ nxoc01.cern.ch >
Subject: archie wais server - WWW access directly to files

   Date: Tue, 18 Feb 92 14:25:04 GMT+0100
   From: timbl@nxoc01.cern.ch (Tim Berners-Lee)

   ...
   Magic -- I really like it.  I have modified the WAIS-WWW gateway so that is the  
   database is an archie one (i.e.database name contains "archie" :-| ) a W3 search of  
   the database will give links directly to the files.  This relies on the headline  
   starting with host:/filename.

I wish we used /filename@hostname so that waisretrieve could handle it (and
be compatible with the WAIS doc-id.

   Now the things which need neatening up. The gateway I was using is here in CERN,  
   and the server is at ordst.edu. This implies a transatlantic link for every search.

   - Would you or anyone like to run the gateway locally? Volunteers?
WAIS packets are small, but the distance makes reduces reliability.

   - Perhaps we should run the WAIS archie here, mirroring files.

   - One of us could run a direct W3 archie server rather than using a gateway
	process.

   To use www as an archie client,

	   setenv  WWW_wais_GATE  http://info.cern.ch:8001/

   (or point to a more local gateway!) and 


	   alias archie  www wais://archive.orst.edu:9000/archie-orst.edu

   I've put a pointer to the archie index in our CERN home page, but not publicised it  
   yet.

   Tim BL

This is really cool.

-brewster


From timbl  Tue Feb 18 16:44:35 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20333; Tue, 18 Feb 92 16:44:35 GMT+0100
Date: Tue, 18 Feb 92 16:44:35 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202181544.AA20333@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Brewster Kahle <brewster@think.com>
Subject: Documet ids (was Archie, WWW access directly to files, and document ids
Cc: jcurran@nnsc.nsf.net, kean@talon.ucs.orst.edu, wais-talk@think.com,
        www-talk@nxoc01.cern.ch

From: Brewster Kahle <brewster@think.com>

   Date: Tue, 18 Feb 92 14:25:04 GMT+0100
   From: timbl@nxoc01.cern.ch (Tim Berners-Lee)

Tim:
  ... This relies on the headline [of an archie index] 

	starting with host:/filename.

Brewster:

  I wish we used /filename@hostname so that waisretrieve could handle it (and
  be compatible with the WAIS doc-id.

Tim:

Well, Brester, I wish that we used //hostname/filename so that it would be
directly compatible with the W3 doc-id ;-). In fact, of course, the user never ses  
the doc-ids themselves as he browses.

But seriously, Brewster, you suggested to John Curran <jcurran@nnsc.nsf.net> that,  
on the subject of document ids, "[Brewster's] proposal that is on the table is  
worth implementing for a good run".  I would suggest that John look at w3 Universla  
Document Identifiers as a similar but more open and more established scheme.  W3  
has been running now for 18 months or so using the UDI syntax and the addressing  
syntax has expanded easily to include wais and now gopher. The www retrieval engine  
will handle any of these, FTP access and news access etc.

When x500 document naming becomes practical, it will be important that the UDI  
scheme can expand to accept x500 names.

There is nothing proprietory or w3-specific about w3 UDIs.  They are generic,  
caonnonical and univeral. Could I strongly suggest that you extend waisretrieve to  
use UDIs?

[Your objection to the w3 UDIs was that you prefered "@" to "//" because you wanted  
to use parsers written for mail.  Is that is a strong enough reason for inventing a  
new, wais-specific scheme instead of using an existing, open one?  Actually, the w3  
scheme uses @ for  //user@hostname/ when a username is needed, which is even more  
mail-like. The choice of punctuation is of course fairly arbitrary.]

The universal document identifier syntax is dead simple. It is described in BNF in  
/pub/www/doc/udi.txt.  Comments from all parties who haven't seen it before are  
solicited.

We must to put systems together to make the information universe happen, and not  
quarrel about trivia. We must remain open to the future.

I must put down the reasons for UDIs in a paper, but they're probably obvious to  
everyone on these lists....

	Tim

From kean@argh.ucs.orst.edu  Tue Feb 18 19:30:51 1992
Return-Path: <kean@argh.ucs.orst.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20651; Tue, 18 Feb 92 19:30:51 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA20672; Tue, 18 Feb 92 19:25:11 +0100
Received: from argh.UCS.ORST.EDU by mcsun.EU.net with SMTP;
	id AA14104 (5.65a/CWI-2.143); Tue, 18 Feb 1992 19:24:40 +0100
Received: from localhost.ucs.orst.edu by argh.UCS.ORST.EDU (4.1/SMI-DDN)
	id AA28008; Tue, 18 Feb 92 10:21:46 PST
Message-Id: <9202181821.AA28008@argh.UCS.ORST.EDU>
To: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Cc: kean@talon.ucs.orst.edu, wais-talk@think.com, www-talk@nxoc01.cern.ch
Subject: Re: archie wais server - WWW access directly to files 
In-Reply-To: Your message of Tue, 18 Feb 92 14:25:04 +0100.
			 <9202181325.AA19967@ nxoc01.cern.ch > 
Date: Tue, 18 Feb 92 10:21:42 PST
From: kean@argh.ucs.orst.edu

We've a T1 to the T3 backbone, so mirroring the archie stuff I have isn't 
a problem (from my end).

I'll provide the scripts and the glue etc to do the mirroring and indexing to
whoever wants to do it.  

**** I'd really like to see a few more waised archie servers out there ****
Why?  Well, mostly mercenary self-interest (archive.orst.edu is the campus 
news server too) + redundancy.  Takes 600 Mb to do it properly (if you have
400Mb main memory you could get away with 450 Mb disk space 8) ).

Kean

Kean Stump (503)-737-4740                 Why choose the *lesser* of
two evils?
OSSHE Network Operations                  Vote for Cthulu, '92
DOMAIN: kean@ucs.orst.edu                 UUCP: hplabs!hp-pcd!orstcs!kean

From kean@argh.ucs.orst.edu  Tue Feb 18 19:32:02 1992
Return-Path: <kean@argh.ucs.orst.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20657; Tue, 18 Feb 92 19:32:02 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA20691; Tue, 18 Feb 92 19:26:23 +0100
Received: from argh.UCS.ORST.EDU by mcsun.EU.net with SMTP;
	id AA14194 (5.65a/CWI-2.143); Tue, 18 Feb 1992 19:25:52 +0100
Received: from localhost.ucs.orst.edu by argh.UCS.ORST.EDU (4.1/SMI-DDN)
	id AA28016; Tue, 18 Feb 92 10:23:03 PST
Message-Id: <9202181823.AA28016@argh.UCS.ORST.EDU>
To: Brewster Kahle <brewster@think.com>
Cc: timbl@nxoc01.cern.ch, kean@talon.ucs.orst.edu, wais-talk@think.com,
        www-talk@nxoc01.cern.ch
Subject: Re: archie wais server - WWW access directly to files 
In-Reply-To: Your message of Tue, 18 Feb 92 05:58:16 -0800.
			 <9202181358.AA28643@quake.think.com> 
Date: Tue, 18 Feb 92 10:23:02 PST
From: kean@argh.ucs.orst.edu

I'm going to reformat the database to the /filename@hostname convention on 
the next mirror run I do.  I haven't forgotten 8)

Kean 

Kean Stump (503)-737-4740                 Why choose the *lesser* of
two evils?
OSSHE Network Operations                  Vote for Cthulu, '92
DOMAIN: kean@ucs.orst.edu                 UUCP: hplabs!hp-pcd!orstcs!kean

From emv@heifetz.msen.com  Wed Feb 19 05:48:30 1992
Return-Path: <emv@heifetz.msen.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA21309; Wed, 19 Feb 92 05:48:30 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA19551; Wed, 19 Feb 92 05:42:51 +0100
Received: by heifetz.msen.com (/\==/\ Smail3.1.22.1 #22.11)
	id <m0lGj60-000HxNC@heifetz.msen.com>; Tue, 18 Feb 92 23:39 EST
Message-Id: <m0lGj60-000HxNC@heifetz.msen.com>
To: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Cc: Brewster Kahle <brewster@think.com>, jcurran@nnsc.nsf.net,
        kean@talon.ucs.orst.edu, wais-talk@think.com, www-talk@nxoc01.cern.ch
Subject: Re: Documet ids (was Archie, WWW access directly to files, and document ids 
In-Reply-To: Your message of Tue, 18 Feb 92 16:44:35 +0100.
             <9202181544.AA20333@ nxoc01.cern.ch > 
Date: Tue, 18 Feb 92 23:39:44 -0500
From: Edward Vielmetti <emv@msen.com>

Throughout the years people have used different ways to 
describe files available for anonymous ftp.  that has never
been standardized ever.  no reason to believe it ever will
be.

If we are going to come up with a special kind of document
that refers to a new document type that has the semantics
of "pointer to file available for anonymous ftp", then it
should be assigned a WAIS type tag, described, and specified.
I'd suggest the tag AFTP.  Someone write a spec, we'll all
write code, & be done with it.  (There's plenty of data after
all.)  It would be better to do that rather than to use a TEXT
type tag and bicker about the format.

I don't think it would be hard for the WWW gateway to WAIS to
do special things to documents if they had a different type,
and then use that to convert AFTP type documents to WWW format.
Ditto gopher, archie, etc. clients.

If it's TEXT, on the other hand, it can be *anything*.  Please
don't overload the semantics of the name of the server or the 
accidental formatting of the contents of the document.  I would
like to create AFTP records to stick into many servers.

--
Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com
      MSEN Inc., 628 Brooks, Ann Arbor MI  48103 +1 313 741 1120
     "Things are glued together with spit and bailing wire now."

From emv@heifetz.msen.com  Wed Feb 19 05:49:40 1992
Return-Path: <emv@heifetz.msen.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA21315; Wed, 19 Feb 92 05:49:40 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA19574; Wed, 19 Feb 92 05:44:01 +0100
Received: by heifetz.msen.com (/\==/\ Smail3.1.22.1 #22.11)
	id <m0lGj7D-000HxNC@heifetz.msen.com>; Tue, 18 Feb 92 23:41 EST
Message-Id: <m0lGj7D-000HxNC@heifetz.msen.com>
To: www-talk@nxoc01.cern.ch
Subject: bug in HTAccess.c confused me - WWW_wais_GATEWAY
Date: Tue, 18 Feb 92 23:41:05 -0500
From: Edward Vielmetti <emv@msen.com>

the error message refers to WWW_wais_gateway, but the code
wants WWW_wais_GATEWAY.  either way is fine by me so long
as you're consistent...

--Ed

From timbl  Wed Feb 19 10:41:35 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA21668; Wed, 19 Feb 92 10:41:35 GMT+0100
Date: Wed, 19 Feb 92 10:41:35 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202190941.AA21668@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Edward Vielmetti <emv@msen.com>
Subject: Re: Documet ids (was Archie, WWW access directly to files)
Cc: Brewster Kahle <brewster@think.com>, jcurran@nnsc.nsf.net,
        kean@talon.ucs.orst.edu, wais-talk@think.com, www-talk@nxoc01.cern.ch



Date: Tue, 18 Feb 92 23:39:44 -0500
From: Edward Vielmetti <emv@msen.com>

> Throughout the years people have used different ways to 

> describe files available for anonymous ftp.  that has never
> been standardized ever.  no reason to believe it ever will
> be.
	Is there a similey for a big sigh?...

> If we are going to come up with a special kind of document
> that refers to a new document type that has the semantics
> of "pointer to file available for anonymous ftp", then it
> should be assigned a WAIS type tag, described, and specified.
> I'd suggest the tag AFTP.  Someone write a spec, we'll all
> write code, & be done with it.  (There's plenty of data after
> all.)  It would be better to do that rather than to use a TEXT
> type tag and bicker about the format.

This Archie-wais-www has to get around the fact that the doc-id in  
the search response is not the doc-id of the file, it's the id of a  
line in the site listing which refers to the file.  However, one  
wants to jump straight to the file, rather than to the site listing.  
For this reason, the gateway throws away the wais doc-id and  
generates an id for the file itself from the headline. If the doc-id  
itself was that of the file (in any format), that would be cleaner of  
course, as the headline could be in any human readable format. [Would  
that be easy, Kean?]

> I don't think it would be hard for the WWW gateway to WAIS to
> do special things to documents if they had a different type,
> and then use that to convert AFTP type documents to WWW format.
> Ditto gopher, archie, etc. clients.

It would be possible, sure. Do we want to have to access an AFTP type  
document just to get a pointer to an FTP site? This takes time, I'd  
prefer top skip that step.

> If it's TEXT, on the other hand, it can be *anything*.  Please
> don't overload the semantics of the name of the server or the 

> accidental formatting of the contents of the document.  I would
> like to create AFTP records to stick into many servers.

I agree that overloading the database name is horrible! Its a hack to  
show what is possible. You can only do it cleanly if you has  
universal document ids of some form or other.

Sure, clients and gateways can convert UDI formats -- avoids the  
bickering but not as cool as having a common format. (Need that  
smiley again!)

[BTW, If you're going to have an AFTP file format for pointing to  
aftp sites, will you also need a GOPH file format for pointing to  
gopher sites, and a NEWS file format for pointing to newsgroups...?   
Suppose you do have some universal id scheme. Then you could have one  
format for a file of pointers. Using the SES filter system, indexing  
that file could (if it looked like a README for example) retrieve the  
referenced document and index the actual document rather than just  
the name.]

	Tim


__________________________________________________________
Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web initiative             (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155






From timbl  Fri Feb 21 10:52:53 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA28042; Fri, 21 Feb 92 10:52:53 GMT+0100
Date: Fri, 21 Feb 92 10:52:53 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202210952.AA28042@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: brennan@hal.com (Dave Brennan)
Subject: Making an http gateway
Cc: www-talk@nxoc01.cern.ch

Dave,

> In our network environment there is only one machine that has  
direct access
> to the Internet for security reasons.  Is there a way to run the  
client on
> some other system in our net and have it use the "gateway" machine?   
I am
> very interested in learning more about the WWW and similar  
projects.  It's
> kind of annoying that the setup here makes this difficult.
>
> Thanks,
>
> Dave Brennan

That's a very good question, as lots of companies have this  
restriction.
Fortunately, the browser is set up so that you can
pass requests (according to protocol in the document id) to a  
gateway.  You can on the client node setenv WWW_http_GATEWAY and  
WWW_wais_GATEWAY etc to point gateway addresses. At that address you  
need to set up a server to access the remote whateveritis and send  
the results back as hypertext.

	HTTP gateway:

I've just tried this and it seems to work fine. You use the www  
client as a server. On your gateway machine,

1. Put a new service (htgate, 8002 say) into /etc/services or  
whatever you use instead.

2. Set up inetd.conf to run /usr/local/bin/htgate when a tcp  
connection to htgate comes in. You have to kill -HUP the inetd  
process to get it to take the change into account.

3. Use this htgate script which reads the command line and then calls  
www with its second parameter and sends the source to the caller:

#! /bin/sh
read get docid
/usr/local/bin/www -source -n -na -p "$docid"

That will handle anything where the source is HTML.  It won't handle  
internet news, gopher, etc.  (For that we need an option on the  
browser -html which generates HTML
from news, etc, which we haven't done yet -- but you're not the first  
to ask)

I suggest you run the wais gateway separately, but you could change  
the shell script to check the protocol on the doc-id and run www or  
WAISGate as appropriate. That means you set separately on the client

	setenv WWW_http_GATEWAY http://gate.hal.com:8002/
	setenv WWW_wais_GATEWAY http://gate.hal.com:8001/

For the wais gateway, you get the sources from info.cern.ch  
/pub/www/stcWWWDaemon_v.vv.tar.Z and compile it. The documentation is
http://info.cern.ch/hypertext/WWW/Daemon/User/Guide.html and linked  
documents.

I hope this solves the problem and opens up the web for the  
organsiation.  If you need the option for news or gopher, we'll have  
to put the -html option in. Let me (and the list) know how you get  
on...

	Tim

From timbl  Mon Feb 24 17:21:19 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA02717; Mon, 24 Feb 92 17:21:19 GMT+0100
Date: Mon, 24 Feb 92 17:21:19 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202241621.AA02717@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: W.vanLeeuwen@nikhef.nl (Willem van Leeuwen)
Subject: Re: www @ nikhef
Cc: www-talk@nxoc01.cern.ch

Willem,

Great to see the NIKHEF server on the web!  For others on the list,
see Willem's message below for examples of how to put together
www servers, including indexes, out of simple shell scripts.

I have put links to NIKHEF from our home page, and from the subject and  
organisation index.

There is one little bug in Xfind: the reference to //nic/ rather than //nic.nl./  
will work for those in the Netherlands but not for the rest of us.
(By the way, xfind is the name of a program by Bernd Pollermann which runs on VM  
... he might feel that the name is his.)

This is a neat server, particularly as it uses existing unix tools and data to  
provide a useful service.... keep up the good work!

	Tim BL

______________________________________________________________________
Date: Thu, 20 Feb 92 14:58:11 +0100
From: W.vanLeeuwen@nikhef.nl (Willem van Leeuwen)
Organisation: Nikhef-H (National Institute for Nuclear and High-Energy Physics)
Address: Kruislaan 409, P.O. Box 41882, 1009 DB Amsterdam, the Netherlands
Phone: +31 20 5920411, +31 2995 2499 (home)
Telex: 10262 hef nl
Telefax: +31 20 5925155
To: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Subject: Re: www @ nikhef
Cc: a03@nikhef.nl

Hi,

There is a very preliminary version of xfind working on a very stupid set
of helpfiles.
These helpfiles are input to a VAX like help, so browsing with www
does not give very useful information, I only want to show that
the principle works.

If you want to try you may link NIKHEF into WWW with

	http://nic.nikhef.nl./user/a03/www/default/NikhefGuide.html

I now have 2 files which can be searched with keywords: the telephone
directory and this set of helpfiles.

The http daemon calls the script WWWsh:

WWWsh
=====

name=`echo $@ | awk -F? '{print $1}`
keys=`echo $@ | awk -F? '{print $2}`
name=`basename $name .html`
/user/a03/bin/$name.sh $keys

which may call Phone.sh or Xfind.sh

Phone.sh
========

name=$1
echo "<title> $name at NIKHEF</title>"
echo "<h1> $name</h1>"
grep -i $name /user/a03/www/default/phone.html

phone.html is a file which is generated every night from the finger
information on our central server.

Xfind.sh
========

name=$1
echo "<title> $name at XFIND </title>"
echo "<h1> $name</h1>"
#
# Do not forget to put 

#	pass	/user/a03/www/xfind/*
# in httpd.conf
#
cd /user/a03/www/xfind
files=`echo $name | /usr/lib/refer/hunt -Fn -Ty Index`
echo $files | sed -e "s/\///g" | awk '{for (i=1;i<=NF;i++) printf("<a  
href=http://nic/user/a03/www/xfind/%s>%s</a><p>",$i,$i)}'

The helpfiles are in a different directory, which has to be mentioned
in httpd.conf.

The index is made with the command

	/usr/lib/refer/mkey -w -f files | /usr/lib/refer/inv

files contains the names of the files to be indexed.
This is a rather crude approach, since the index contains a lot of keywords,
but again the first aim was to get something working.

I'll try to write down wy experiences with www in a more coherent way,
until then I hope this information is of some help.

Regards, Willem


From timbl  Tue Feb 25 15:44:49 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA04800; Tue, 25 Feb 92 15:44:49 GMT+0100
Date: Tue, 25 Feb 92 15:44:49 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202251444.AA04800@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: W.vanLeeuwen@nikhef.nl (Willem van Leeuwen)
Subject: Browser append to file, and SPIRES server
Cc: www-talk@nxoc01.cern.ch, pfkeb@slacvm.slac.stanford.edu



> Date: Tue, 25 Feb 92 13:40:05 +0100
> From: W.vanLeeuwen@nikhef.nl (Willem van Leeuwen)


> Some remarks about the new browser:
> 

> The use of >> to add text of a document to an existing file should  
be
> documented in the help:
>
>  > file          Save the text of this document in a file.
>  >>file          Add the text of this document to a file.
>

	Good point -- In fact I hadn't noticed that it works by
	virtue of the same code which makes ">" work!
	It's in the help for the next version.

> When browsing the SPIRES database one has to type find twice:
> the first is interpreted by the browser, the second by SPIRES.
> (The old browser used K, which could be omitted).

	Yes - This should be fixed at the server end.

> SPIRES does not give always the promised number of references.
> Try f find author holthuizen, there should be 111 references, only
> 104 are shown.

	This too.  The last one's a bit odd, as the limit seems to 

	vary. Perhaps its in lines. Another thing is taht the lines
	trainling spaces to 80 characters which makes them wrap aound
	a 79-character terminal. They could be stripped in the server
	EXEC too.

>Willem
	-Tim


From timbl  Tue Feb 25 15:52:17 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA04826; Tue, 25 Feb 92 15:52:17 GMT+0100
Date: Tue, 25 Feb 92 15:52:17 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202251452.AA04826@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: "Mark P. McCahill" <mpm@boombox.micro.umn.edu>
Subject: Re: Size limits for text files? 
Cc: emv@cic.net, JONZY@cc.utah.edu, gopher-news@boombox.micro.umn.edu,
        www-talk@nxoc01.cern.ch

> Another thing clients should do (but don't yet) is to cache  
information.
> That is, the client ought to keep the previous list of gopher items  
in memory 

> so you don't have to fetch it again. Of course, there should also  
be a timer...

As happens, the www client caches information. The number of  
documents (text files or menus) cached is currently 2, defined by  
LOADED_LIMIT in GridText.c. We'll increase it next release, as 5  
seems more useful.  There's no timer, but it its a good idea.

	Tim BL

From emv@cic.net  Tue Feb 25 18:25:12 1992
Return-Path: <emv@cic.net>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA05367; Tue, 25 Feb 92 18:25:12 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA01587; Tue, 25 Feb 92 18:19:45 +0100
Received: by nic.cic.net (4.1/SMI-4.1)
	id AA15081; Tue, 25 Feb 92 12:19:14 EST
Message-Id: <9202251719.AA15081@nic.cic.net>
To: timbl@nxoc01.cern.ch
Cc: "Mark P. McCahill" <mpm@boombox.micro.umn.edu>, JONZY@cc.utah.edu,
        gopher-news@boombox.micro.umn.edu, www-talk@nxoc01.cern.ch
Subject: Re: Size limits for text files? 
In-Reply-To: Your message of "Tue, 25 Feb 92 15:52:17 +0100."
             <9202251452.AA04826@ nxoc01.cern.ch > 
Date: Tue, 25 Feb 92 12:19:12 -0500
From: emv@cic.net

> WWW caches texts

hmmm.  If you were running a WWW "http" gateway, I suppose you
could do a big pile of caching - rather than have every individual
user go out to the world to fetch documents individually, they
would go to your relay server which might well have the things
they were interested in already.

Such things have also been proposed for FTP servers, I guess I
would add -- you'd connect to a local caching FTP server, from
which you could 'cd' to other anonymous FTP sites; if the local
cache didn't have what you wanted it'd go off to the real place
to get it.

Fortunately gopher and WWW both seem more amenable to 
hacking^H^H^H^H^H^H^H research in this regard than the
usual FTP demon.

w/r/t size - like I say I don't want to have hard coded limits
for things, but people doing design need to keep in mind that
if a menu pick results in a megabyte worth of text being thrown
at my client I'm not going to be happy about it...

--Ed

From timbl  Thu Feb 27 17:22:44 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA09065; Thu, 27 Feb 92 17:22:44 GMT+0100
Date: Thu, 27 Feb 92 17:22:44 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9202271622.AA09065@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: chi-arch@uccvms.bitnet, www-talk@nxoc01.cern.ch, wais-talk@think.com,
        iafa@cc.mcgill.ca
Subject: Draft: Universal Document Identifiers
Cc: Rare WG3 <rare-wg3@surfnet.nl>, nisi@merit.edu


An information universe on the [Inter]net requires a cannonical form  
for the name or address of a document.  Building on much previous  
network discussion, a draft discussion paper

	Universal Document Identifiers on the Network

 is now availableby anonymous FTp from node info.cern.ch as
	/pub/www/doc/udi1.ps   (or if desperate .txt)

or, to use its UDI, file://info.cern.ch/pub/www/doc/udi1.ps.
This describes an addressing scheme encompassing many objects on the  
network including archived files, news articles and groups, gopher
things, wais indexes, queries, and documents.

Comments are solicited.

A shorter paper on the requirements a universal hypertext system
such as WWW imposes on WAIS and x.500 protocols is
file://info.cern.ch/pub/www/doc/wais_x500_www.ps (or .txt)
is intended as input to the IETF wais/x500 BOF in March
(which I might be able to go to)

	Tim

__________________________________________________________
Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web initiative             (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155






From bcn@isi.edu  Thu Feb 27 19:59:31 1992
Return-Path: <bcn@isi.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA09473; Thu, 27 Feb 92 19:59:31 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA10546; Thu, 27 Feb 92 19:54:05 +0100
Received: from tgo.isi.edu by venera.isi.edu (5.65c/5.65+local-3)
	id <AA02373>; Thu, 27 Feb 1992 10:55:15 -0800
Date: Thu, 27 Feb 92 10:52:44 PST
Posted-Date: Thu, 27 Feb 92 10:52:44 PST
Message-Id: <9202271852.AA05956@tgo.isi.edu>
Received: by tgo.isi.edu (4.1/4.0.3-4)
	id <AA05956>; Thu, 27 Feb 92 10:52:44 PST
From: bcn@isi.edu (Clifford Neuman)
Sender: bcn@isi.edu
To: timbl@nxoc01.cern.ch
Cc: chi-arch@uccvms.bitnet, www-talk@nxoc01.cern.ch, wais-talk@think.com,
        iafa@cc.mcgill.ca, rare-wg3@surfnet.nl, nisi@merit.edu
In-Reply-To: Tim Berners-Lee's message of Thu, 27 Feb 92 17:22:44 GMT+0100 <9202271622.AA09065@ nxoc01.cern.ch >
Subject: Draft: Universal Document Identifiers

I have glanced through your document on universal directory
identifiers, and you seem to have left out Prospero.

Prospero is a little different in that it does late binding of the
access method.  In particular, a Prospero link consists of two parts,
a host name, and a name of the object on that host.  The latter part
is usually a path name, but in reality, it can be any string,
including simply a unique ID.  Thus, a Prospero link might look like

TGO.ISI.EDU /a/b/c  or   GUM.ISI.EDU 27

A Prospero link has a few other fields as well, but perhaps less
important.  There is a type field for the hostname.  It indicates
whether the hostname is an Internet name or address, or perhaps some
other kind of name or address.  Only one type is presently supported
(INTERNET-D) though, and that type includes Internet host names or
addresses, with or without an optional Internet UDP port.

  examples: TGO.ISI.EDU, TGO.ISI.EDU(191), 128.9.224.123, or 128.9.224.123(191)

Only a single type is used for all four types of Internet addresses
since they are not syntactically ambiguous.

The name relative to the host is also typed.  Presently, the only type
supported is ASCII, but the type field is there just in case.

Three other fields are a version number, a unique ID, and a type.  The
meaning of the version number should be fairly obvious.  A version
number of 0 matches the most recent version.  At present, most objects
don't have version numbers, but I felt it was important to include in
in the link data.  The purpose of the unique ID is less obvious.  It
is there to provide a mechanism for detecting when an object has been
deleted and replaced with an object of the same name.  In some cases,
it might be important to note that the object being retrieved is not
the same as the one to which the original link was made.  I will talk
about the type field later since it is not what you might think.

So, that is a Prospero link.  Note, that it does not specify the
access method.  Binding to an access method is accomplished by sending
a message to the Prospero server at the address in the link, and
requesting the access method for the named object.  The response
includes a sequence of tokens, the first identifies the access method,
and the remainder identify the information specific to the access
method (beyond that which already is part of the link).  If you
understand the access method, then you also know how to interpret the
remaining tokens.

For example, a response indicating access by anonymous FTP might be

  ANONYMOUS-FTP /pub/pfs/guest/README BINARY

Note that the host name is not specified since the hostname from the
link is assumed.  If the host name were different than that in the
link, then it would be specified in the response.  The path is
specified, however, because the path to be passed to the FTP program
is different than that in the link (in this case, the link included
the prefix /homes/june/ftp.

Similar responses are supported for other methods, and a response
might include more than one access method, in which case the
application choose the method that best suits its needs.

Now, back to the type field.  One of the shortcomings of the approach
as described so far is that it requires a Prospero server to run on
the system storing the object to be referenced.  This shortcoming is
addressed by the external link.  The type field in a Prospero link
provides information on what can be done with the link.  The three
common types are FILE, DIRECTORY, and EXTERNAL.  The links described
above were of type FILE.  If a links type is directory, its contents
can be listed by contacting the Prospero server (i.e. the links in the
directory can be returned).  If a links type is EXTERNAL, it means
that the object should be accessed without contacting a Prospero
server to obtain the access method (usually because a Prospero server
is not running on the target site).  Instead, the access information
that would otherwise have been returned is encoded as part of the
type.  Thus for example the type of an external link to the file
mentioned above would be.

  EXTERNAL(AFTP,BINARY)

Note that for external links using the AFTP or FTP method, the name
field of the link contains the path name to be passed to FTP.  For
other access methods, the meaning of the field is defined by the
particular access method to be used.

Anyway, I hope this adequate explains the form of Prospero
identifiers, and I hope that you can fit it in to your proposed
format. 

	~ Cliff





From timbl  Mon Mar  2 12:36:33 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA14036; Mon, 2 Mar 92 12:36:33 GMT+0100
Date: Mon, 2 Mar 92 12:36:33 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203021136.AA14036@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: bcn@isi.edu (Clifford Neuman)
Subject: Re: Draft: Universal Document Identifiers
Cc: cni-arch@uccvma.bitnet, www-talk@nxoc01.cern.ch, wais-talk@think.com,
        iafa@cc.mcgill.ca

Cliff,

Thanks for your input, with explanations of addressing in Prospero.

Prospero should certainly go into the document. Indeed, it seems to  
fit in very well.   The small differences raise some interesting  
questions -- reactions off the top of my head follow, in the sequence  
of you messsage.

	Tim
  _______________________________________________

> Date: Thu, 27 Feb 92 10:52:44 PST
> From: bcn@isi.edu (Clifford Neuman)
> 

> I have glanced through your document on universal directory
> identifiers, and you seem to have left out Prospero.

Omission was from ignorance of the details you provide here and will  
certainly be corrected. Prospero is very relevant.

> In particular, a Prospero link consists of two
> parts, a host name, and a name of the object on that host.  The
> latter part is usually a path name, but in reality, it can be any 

> string, including simply a unique ID.  Thus, a Prospero link might 

> look like
>
> TGO.ISI.EDU /a/b/c  or   GUM.ISI.EDU 27

The UDI syntax //TGO.ISI.EDU/a/b/c or //GUM.ISI.EDU/27 matches that  
very well.  I suggest the prefix "prospero:" for prospero addresses.

> A Prospero link has a few other fields as well, but perhaps less
> important.  There is a type field for the hostname.  It indicates
> whether the hostname is an Internet name or address, or perhaps  
some
> other kind of name or address.  Only one type is presently  
supported
> (INTERNET-D) though, and that type includes Internet host names or
addresses, with or without an optional Internet UDP port.
>
>  examples: TGO.ISI.EDU, TGO.ISI.EDU(191), 128.9.224.123, or  
128.9.224.123(191)

The UDI scheme foresees these possibilities. These would map onto
//TGO.ISI.EDU/, //TGO.ISI.EDU:191/, //128.9.224.123/ and  
/128.9.224.123:191/ respectively. The whole UDI of the file above  
would be (if quoted out of the "prospero:" context),

	prospero://TGO.ISI.EDU:191/a/b/c


We, also, wondered about how to extend the system when other  
underlying protcols are used with the same higher-level protocol.  
Suppose for example later one adds dial-up prospero. Should one write

	prospero://dialup:+12025672654:200/a/b/c

or 	prospero-dialup:/+12025672654:200/a/b/c ?

My feeling is that the number of underlying network layers which have  
complete world-wide coverage will remain low. Furthermore, one can  
even imagine gateways there, so that those without X25 acces, say,  
can go throuh some transport level gateway from TCP/IP if the need  
arises. This suggests putting other low-level addresses into the  
"host/port" field, encoded in some fashion. One would hope that there  
will be less forms of transport service access point address than  
there will be application layer protocols.

> The name relative to the host is also typed.  Presently, the only  
type
> supported is ASCII, but the type field is there just in case.

The rule we have used is to put type information, if part of the  
link, into the path.  protocols differ upon whether they regard it as  
part of the link or it is returned when you try to retrieve the data.
In the latter case (which I prefer) it should not be in the UDI at  
all.

> Three other fields are a version number, a unique ID, and a type.  


The version number should I suggest be part of the path. Its  
significance will tend to vary between servers. The trouble is, as  
you say, noone has really put up a system dealing with multiple  
versions. We imagined having hidden links from a document to its  
previous, next and latest versions, and to a table of versions.

>The purpose of the unique ID is ... to provide a mechanism for  
detecting when an object has been
> deleted and replaced with an object of the same name.  In some  
cases,
> it might be important to note that the object being retrieved is  
not
> the same as the one to which the original link was made.

This is non-obvious.  My feeling is that a unique id is a useful  
thing, which I would regard as "header" information, ie information  
you can ask the server for.  Putting it into the link I'm not so sure  
about.  Suppose, for example, the retrieval goes through several  
stages of pointers, being referenced by serveral servers. Do you want  
to check that the final document, or the first link, was really the  
same as the one you made the original link to?

> Binding to an access method is accomplished by sending
> a message to the Prospero server at the address in the link, and
> requesting the access method for the named object.  The response
> includes a sequence of tokens, the first identifies the access  
method,
> and the remainder identify the information specific to the access
> method (beyond that which already is part of the link).  If you
> understand the access method, then you also know how to interpret  
the
> remaining tokens.

That "late binding" is just the sort of "name-server" function which  
I was talking about, and which for example x500 might also fit into.
So long as both the input and the output to the process are UDIs,  
it's very flexible.

> For example, a response indicating access by anonymous FTP might be
> 

>  ANONYMOUS-FTP /pub/pfs/guest/README BINARY

We'd write that now as file:/(samehost)/pub/pfs/guest/README.  
Currently, if the access protocol has to be specified, then the host  
does too. It could default ot the host of the context of the UDI even  
when protcol fields are different.

The "binary" flag is an interesting one and a perennial question.  My  
assumption was that if you know how to handle a file when you've got  
it, then you must know how to transfer it.  In practice with FTP both  
mean that you have to have a table of file suffixes.

> Similar responses are supported for other methods, and a response
> might include more than one access method, in which case the
> application choose the method that best suits its needs.

Sounds fine.

> Now, back to the type field.  One of the shortcomings of the  
approach
> as described so far is that it requires a Prospero server to run on
> the system storing the object to be referenced.  This shortcoming  
is
> addressed by the external link.  The type field in a Prospero link
> provides information on what can be done with the link.  The three
> common types are FILE, DIRECTORY, and EXTERNAL.  The links  
described
> above were of type FILE.  If a links type is directory, its  
contents
> can be listed by contacting the Prospero server (i.e. the links in  
the
> directory can be returned).  If a links type is EXTERNAL, it means
> that the object should be accessed without contacting a Prospero
> server to obtain the access method (usually because a Prospero  
server
> is not running on the target site).  Instead, the access  
information
> that would otherwise have been returned is encoded as part of the
> type.  Thus for example the type of an external link to the file
mentioned above would be.

  EXTERNAL(AFTP,BINARY)

Your "EXTERNAL" type is a pointer to a document in another naming  
scheme which neat, and expandable -- I like it.  The UDI syntax was  
basically invented to allow one to to that, so that all these systems  
can work together. Basically, type EXTERNAL(xxx) maps onto putting an  
xxx: prefix on the UDI. In your example, it maps to giving a file:  
reference.

You have, for prospero, the flag in the link as to whether the object  
is a directory or a file.  So does the Gopher.  This is useful for  
displaying different icons, etc. for the user.  A snag is that if we  
include anonymous FTP file systems, the NLIST command doesn't tell  
you that information, so it doesn't map.  You have to try to retrieve  
it and if that fails, cd to it.  If the flag is considered useful,  
then we could use the converntion (of ls-F) that a/c/b/ is a  
directory and a/b/c is a file. The trouble is, that you can't get  
that information from an FTP server without assuming unix to parse a  
long listing.

Do I _have_ to know in advance whether a Prospero item is a directory  
or a file?

> Note that for external links using the AFTP or FTP method, the name
> field of the link contains the path name to be passed to FTP.  For
> other access methods, the meaning of the field is defined by the
> particular access method to be used.

Yup - the UDI assumptions exactly.

> Anyway, I hope this adequate explains the form of Prospero
> identifiers, and I hope that you can fit it in to your proposed
> format. 

>
>	~ Cliff

Thanks for a very clear explanation.  It soudds as though Prospero  
will fit very well into the format.  I'll put it into the next draft  
of the document.

	- Tim





From timbl  Wed Mar  4 10:42:32 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA17545; Wed, 4 Mar 92 10:42:32 GMT+0100
Date: Wed, 4 Mar 92 10:42:32 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203040942.AA17545@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: jcurran@nnsc.nsf.net
Subject: Re: Draft: Universal Document Identifiers
Cc: cni-arch@uccvma.bitnet@nnsc.nsf.net, wais-talk@think.com,
        www-talk@nxoc01.cern.ch

> Date: Thu, 27 Feb 92 19:45:42 -0500
> From: jcurran@nnsc.nsf.net

>Even if the exact scheme is not used, the requirement
> discussion contained in the paper is quite valuable.
> I have a few comments:
> 

>] Terms
>]
>] The objects on the network which are to be named include
>] objects which can be retrieved, and objects which can be searched.

> Using this definition, one would infer that document identifiers
> would allow reference to a distinct file, a particular mail
> message, news article, etc. I would not anticipate that a document
> identifier would be used to identify a newsgroup, interactive
> service, archive directory , or a wais source.  Are  we trying to
> define a universal id or a universal document id?  Might it be
> better to defer the definition of non-document resources and then
> come back and make the document specific id's be a subset of a
> future general resource identifier?

You are right that the UDIs were inteneded to be able to refer to any  
of those things. (In the W3 world, they all look pretty similar  
anyway -- they are all represented as [hyper]text objects.)  It is  
largely in order to be able to make references to any of those things  
that we need a UDI rather than a WAIS-DI and a W3-DI and a news-DI  
etc etc.  A UDI allows references between systems, and expandability  
for the future.  My answer would be that we are trying to define a  
universal document id, but where "document" has the very wide  
interpretation as any data which can be retrieved, viewed or  
searched: anything to which you might want to make a reference.
For example, a person is not a document (although to have a document  
on the net representing each person might be useful... their  
signature/disclaimer with links to their published works, etc etc.)  
If we can't cope with the objects which are on the net now, how can  
we hope to cope with the wierd things to come .. video clips from the  
news last night etc...


] Relevance
] 

] The life of a name is limited by any information contained within  
it which 

] may become prematurely invalid. It is therefore necessary to limit  
the 

] contents of a name to the information required for the operations  
above. 

] Other extraneous information about the document (its size, data  
format, 

] authorization details, etc) may in general change with time and  
should 

] not be part of the name.

> The proposed document identifiers have many characteristics which 

> may change with time: storage location, access protocol, format, 

> etc. If we focus instead on the "information content" of a \
> document, then it might be possible to form identifiers that are
>  more robust.  Many people consider:
>
> file://info.cern.ch./pub/www/doc/udi1.ps      and 

> file://info.cern.ch./pub/www/doc/udi1.txt
>
> to be the same document; just in different formats.

Precisely. We look forward to the day when a name like

	x500:/CH/CERN/CN/TBL/TechNote-15

will be put through a name server which will return a set of  
addresses. In the mean time, we don't have that ubiquitous name  
server (directory) facility. So we have to make do with physical  
addresses. And different versions of the same document look like  
different documents. Its a shame. The plan is that UDIs can migrate  
from physical addresses to registered names.



> It would be nice to be able to recognize this
> and allow  the user (and user interface) to determine which
> instance should be used for retreival.

Yes. Absolutely.  (The neatest way is for the client to send a set of  
preferences over with the request, and for the server to decide which  
to format to send. This is a suggestion for an evolved wais and/or  
http protoccol.) Another way if for the client to ask a name server  
for addresses, and retrieve the headers of each one to find out which  
representation he'd prefer -- But I'd prefer all the represenattions  
of the document to have the same name right down to the retrieval  
protocol level.

> This recognition may only
> be perform if the document id's (now being used document content
> ids) contain only location and format independant data.  It is easy
> to imagine that uniqueness could be assured by combining
> an organization, author, and title:
>
>
> cern.ch:www-staff:udi1 

>
> ietf:osids:archdirectory-00 


There are two functions: One, to find out whethre two documents are  
the same. Two, to derive a (set of) addresses for retrieval of the  
document. To be able to do the first, any unique id (like OSF/DCE  
UUIDs or RFCxxxx message ids) will work. To be able to do the second,  
a directory service is needed.

> Note that the actual location of the information might be far
> removed from the point of creation, and the format might be
> changed:
>
>cern.ch:www-staff:udi1;file://ftp.uu.net/doc/univeral-docids.PS.Z
>cern.ch:www-staff:udi1;news:<1992Feb21.121919.1@quake.think.com>
>cern.ch:www-staff:udi1;wais://nnsc.nsf.net/info-retrieval-notes?udi1

I see the usefulness of quoting both the unique identifier and the  
physical address. I hope that in the future, though, one will only  
need the first part "cern.ch:www-staff:udi1". That, fed into the  
directory service, will produce a list of addresses.

You can, of course, still quote both: "You need document  
x500:/cern.ch/www-staff/udi1 which I found on  
file://ftp.uu.net/doc/univeral-docids.PS.Z".

I would also suggest that if a document has a unique registered name  
then it should certainly contain that name, so that if you find it  
some otherway, you can refer to it (make links to it) by its official  
name.

> That's all
> /John

Good points -- thanks for the input...I think more needs to go in  
about registered unique names in the document.

	Tim BL



From timbl  Thu Mar  5 15:25:08 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20144; Thu, 5 Mar 92 15:25:08 GMT+0100
Date: Thu, 5 Mar 92 15:25:08 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203051425.AA20144@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: ses@cmns.think.com, peterd@expresso.cc.mcgill.ca (Peter Deutsch)
Subject: Re: Draft: Universal Document Identifiers
Cc: iafa-request@kona.cc.mcgill.ca, cni-arch@uccvma.bitnet,
        www-talk@nxoc01.cern.ch, wais-talk@think.com, iafa@cc.mcgill.ca,
        rare-wg3@surfnet.nl, nisi@merit.edu

[Admin: If anyone is missing documents from this discussion which I  
have, they are all in a mailbox  
file://info.cern.ch/pub/www/doc/udi/discussion.mbox. Some of the  
messages were sent to only some of the lists.  Also, I mis-spelled  
the name of cni-arch.uccvma in my original posting, so some replies  
have not gone there. I will not repost them.  The orginal udi paper  
is slightly updated now. Same UDI -- no versioning ;-)]

Now, about these USDNs:

> Date: Thu, 5 Mar 92 07:32:50 EST
> From: ses@cmns.think.com

There have been several messages now with a common theme: That what I  
called in the udi1 paper a "lasting registered name" is better than  
an "address".

Peter Deutsch argues the point at length in  
<9203042206.AA12411@expresso.cc.mcgill.ca>, using the term USDN by  
analogy with ISBN.

John Curran on <Thu, 27 Feb 92 19:45:42 -0500> argues the same, and  
also suggests quoting both registered name and address (which I  
wasn't so sure about in case they get out of sync).

I completely agree with Peter and Simon's point of view, and I have  
modified the paper to put more emphasis on this. What I obvioulsy  
didn't make clear enough is my feeling that:-

1.There may be more than one USDN scheme, just as there are many  
physical addres schemes.

2. There may be more than two stages: it is  an oversimplifiaction to  
talk of only a USDN and an address: For example, an ISO standard may  
dereference (or as Ed says, "swizzle") to a document produced by the  
IETF which may dereference down to a prospero name which may be a  
pointer to an FTP file.

3. We can't use USDNs now because they aren't there. We need a  
transision strategy.

Therefore, UDis were supposed to be able to hold _either_ a USDN _or_  
a physical address. They weren't intended to get involved with the  
discussion of which USDN/ISBN/ISSN/ISDN (?!) scheme is better. So, I  
say, by all means define an USDN scheme, then register it as a  
possible UDI. If is good and everybody uses it, everything will end  
up with a USDN, and the context will always be USDN documents, so the  
usdn: prefix (or whatever) will not in practice be used. I'm all for  
the market deciding between protocols.

Simon:

> I'm strongly in favour of the two stage lookup process; X.500 is  
obvious
> technology, although it is rather heavyweight for personal  
computers. An 

> alternative might be some sort of DNS/archie-like service. These  
could return
> Tim's UDIs, which could then deliver the good themselves.

I would say "a server takes x500 UDIs and returns physical UDIs which  
deleiver the goods themselves.", meaning the same thing.  (I would  
allow it the option of delivering a set of addresses, not just one.)  
Yes, x500 is heavyweight so one can have a lighter protocol which  
accesses a real x500 engine via a gateway with a large cache.

> Of course, invdidual information sources should still use local  
document 

> numbers where possible, but should provide a way of mapping from  
local-id
> to universal-id when needed.

Yes.

> One little question: What should be done about document versions?
> Obviously, different versions of a document should have different
> UDSNs, but should there be a simple way to compare USDNs modulo
> versions? 


Good point.  What about versions which split?  A great spin-off of  
having versions available is that you can refer to a line number in  
them. A line number in a document which is not frozen is useless.  
[This solves a recurring problem in hypertext systems, when one wants  
to link to part of a document to which one has no write access, and  
which may change].

> Here are some suggestions.. Eat hot ASN, Cultural Cringer.
> [...]

We must be careful not to reinvent the wheel: if the USDN problem is  
the same as the phone book problem (which it seems to be) then we  
should pick up on x500.

An important thing about x.500 is that it was designed to scale (I  
hope!).  By contrast as Ed says:

| Date: Wed, 04 Mar 92 23:52:05 -0500
| From: Edward Vielmetti <emv@msen.com>
| [...]
| ISBN is hierarchical so you can stamp out your own
| unique ID's; ISSN (international standard serial number) has
| a central cataloging authority.

and i doubt whether either of those will scale to allow document  
publishing on the net by every kindergarten child etc etc twice a  
minute. That's why I assume x500 is best in theory at least. But tell  
me I'm wrong.

Ed also mentions message-ids which are after all unique. The trouble  
is, there's no way of looking up where to find them.

	Tim

__________________________________________________________
Tim Berners-Lee                       timbl@info.cern.ch
World Wide Web initiative             (NeXTMail is ok)	
CERN                                  Tel: +41(22)767 3755
1211 Geneva 23, Switzerland           Fax: +41(22)767 7155






From ses@cmns.think.com  Thu Mar  5 16:15:18 1992
Return-Path: <ses@cmns.think.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20281; Thu, 5 Mar 92 16:15:18 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA02722; Thu, 5 Mar 92 16:10:07 +0100
Received: by Cmns.Think.COM (5.57/Ultrix2.4-C)
	id AA13491; Thu, 5 Mar 92 10:09:58 EST
Date: Thu, 5 Mar 92 10:09:58 EST
From: ses@cmns.think.com (Simon Edward Spero)
Message-Id: <9203051509.AA13491@Cmns.Think.COM>
To: timbl@nxoc01.cern.ch
Cc: peterd@expresso.cc.mcgill.ca, iafa-request@kona.cc.mcgill.ca,
        cni-arch@uccvma.bitnet, www-talk@nxoc01.cern.ch, wais-talk@think.com,
        iafa@cc.mcgill.ca, rare-wg3@surfnet.nl, nisi@merit.edu
In-Reply-To: Tim Berners-Lee's message of Thu, 5 Mar 92 15:25:08 GMT+0100 <9203051425.AA20144@ nxoc01.cern.ch >
Subject: Draft: Universal Document Identifiers


   usdn: prefix (or whatever) will not in practice be used. I'm all for  
   the market deciding between protocols.

"That's the nice think about standards- there are so many of them to choose
 from" :-) Universal is as Universal does...

   Simon:

   I would say "a server takes x500 UDIs and returns physical UDIs which  
   deleiver the goods themselves.", meaning the same thing.  (I would  
   allow it the option of delivering a set of addresses, not just one.)  
   Yes, x500 is heavyweight so one can have a lighter protocol which  
   accesses a real x500 engine via a gateway with a large cache.

I think we're getting on to the really big problem I've seen in every
single Doc-ID discussion: every body seems to use the same words to
mean different things. To me, there's no such thing as a physical UDI.
There can be a reference to a physical copy of a document named by a
UDI, but that doesn't seem to be what you mean. confusing everybody
else. Anybody want to offer up an 'official' notation?


   Good point.  What about versions which split?  A great spin-off of  
   having versions available is that you can refer to a line number in  
   them. A line number in a document which is not frozen is useless.  
   [This solves a recurring problem in hypertext systems, when one wants  
   to link to part of a document to which one has no write access, and  
   which may change].

   > Here are some suggestions.. Eat hot ASN, Cultural Cringer.
   > [...]

   We must be careful not to reinvent the wheel: if the USDN problem is  
   the same as the phone book problem (which it seems to be) then we  
   should pick up on x500.

Just a couple of tyres...there should be no problem using those PDUs
with X.500 (Steve?). 


   and i doubt whether either of those will scale to allow document  
   publishing on the net by every kindergarten child etc etc twice a  
   minute. That's why I assume x500 is best in theory at least. But tell  
   me I'm wrong.

Distinguished names are ok, but I'd still rather have an OID associated
with each naming authority (maybe in the future, everybody will be 
issued with an OID at birth! What's your clearance, citizen?)

Simon

From timbl  Wed Mar 11 12:07:28 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA28084; Wed, 11 Mar 92 12:07:28 GMT+0100
Date: Wed, 11 Mar 92 12:07:28 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203111107.AA28084@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Larry Masinter <masinter@parc.xerox.com>
Subject: Re: Draft: Universal Document Identifiers
Cc: peterd@expresso.cc.mcgill.ca, cni-arch@uccvma.bitnet,
        www-talk@nxoc01.cern.ch, wais-talk@think.com, iafa@cc.mcgill.ca


>> Peter Deutsch's message  <9203051920.AA14978@expresso.cc.mcgill.ca>
>> Actually, Mike Schwartz has suggested using CRC checksums,

> From: Larry Masinter <masinter@parc.xerox.com>
> You can do better than that by either:
> a) use a good digital signature (MD5 or Snefru or ...). [...]
> b) rely on something else that's unique, e.g., hostid + timestamp, ISO
> DFR's DORs, Object Identifiers, etc. 


> We've been using 256-bit UDSNs and are happy with the scheme. I'm
> hoping we'll have a writeup together before next week.

Peter, USDN is your term, so you decide what is and isn't one.

However, a UDI I define to be something you can use to get the object. You can't  
use digital signatures, or mail-style message-ids for that. You need some hints in  
the identifier as to where to start looking. (We're talking scalable distributed  
system here, no central hash tables allowed.) Knowing when you have a document that  
you have the right document is a different problem, but with a good name space  
(like x500) you can do both operations.

Tim

From peterd@expresso.cc.mcgill.ca  Wed Mar 11 19:04:39 1992
Return-Path: <peterd@expresso.cc.mcgill.ca>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA28995; Wed, 11 Mar 92 19:04:39 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA05278; Wed, 11 Mar 92 18:59:29 +0100
Received: from expresso.CC.McGill.CA by kona.cc.mcgill.ca with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b)
        id AA00781  (mail destined for timbl@nxoc01.cern.ch) on Wed, 11 Mar 92 12:45:46 -0500
Received: by expresso.cc.mcgill.ca (NeXT-1.0 (From Sendmail 5.52)/NeXT-1.0)
	id AA02532; Wed, 11 Mar 92 12:45:32 EST
Message-Id: <9203111745.AA02532@expresso.cc.mcgill.ca>
In-Reply-To: Tim Berners-Lee's message as of Mar 11, 12:07
From: peterd@expresso.cc.mcgill.ca (Peter Deutsch)
Date: Wed, 11 Mar 92 17:45:31 GMT-0:02
In-Reply-To: Tim Berners-Lee's message as of Mar 11, 12:07
X-Mailer: Mail User's Shell (6.5.6 6/30/89)
To: Tim Berners-Lee <timbl@nxoc01.cern.ch>,
        Larry Masinter <masinter@parc.xerox.com>
Subject: Re: Draft: Universal Document Identifiers
Cc: cni-arch@uccvma.bitnet, www-talk@nxoc01.cern.ch, wais-talk@think.com,
        iafa@cc.mcgill.ca

> From timbl@nxoc01.cern.ch Wed Mar 11 06:03:54 1992
> >> Peter Deutsch's message  <9203051920.AA14978@expresso.cc.mcgill.ca>
> >> Actually, Mike Schwartz has suggested using CRC checksums,
> 
> > From: Larry Masinter <masinter@parc.xerox.com>
> > You can do better than that by either:
> > a) use a good digital signature (MD5 or Snefru or ...). [...]
> > b) rely on something else that's unique, e.g., hostid + timestamp, ISO
> > DFR's DORs, Object Identifiers, etc. 
> 
> > We've been using 256-bit UDSNs and are happy with the scheme. I'm
> > hoping we'll have a writeup together before next week.
> 
> Peter, USDN is your term, so you decide what is and isn't one.

I want a UDSN to be something that lets me identify the
contents of a file and compare the contents of multiple
files to test for uniqueness. In the long run I'd also
like them to permit me to identify contents across
multiple encodings, but that's harder and I'm prepared to
wait for that.

I wouldn't be so bold as to try and decide what makes a
suitable UDSN but I hope that we can discuss the issue at
IETF next week (since we will have so many of the players
there) and arrive at some sort of consensus.

I can say what _I_ want them for, and hope that this is
something that would be useful to enough other people that
we can agree to deploy something soon. Certainly there are
a number of candidates, and Larry has named some of the
most likely. I think something that can be applied
retroactively (MD5?) would be preferable to something like
hostID and timestamp, which would be hard to retrofit to
the existing archive collections.

> However, a UDI I define to be something you can use to get the object. .  .
> .  .  . Knowing when you have a document that  
> you have the right document is a different problem, but with a
> good name space (like x500) you can do both operations.

I'm principally interested in UDSNs at this point to allow
comparisions between multiple items (perhaps found in disparate
environments). I don't see how the X.500 name space can
help me here (unless I'm misunderstanding what you mean?).
Certainly it seems that UDIs should help locate items.
That seems to be their raison d'etre.


				- peterd

-- 

From masinter@parc.xerox.com  Wed Mar 11 19:22:44 1992
Return-Path: <masinter@parc.xerox.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA29032; Wed, 11 Mar 92 19:22:44 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA06015; Wed, 11 Mar 92 19:17:45 +0100
Received: from poplar.parc.xerox.com ([13.2.16.165]) by alpha.xerox.com with SMTP id <13949>; Wed, 11 Mar 1992 10:15:39 PST
Received: by poplar.parc.xerox.com id <101795>; Wed, 11 Mar 1992 10:15:28 -0800
To: timbl@nxoc01.cern.ch
Cc: peterd@expresso.cc.mcgill.ca, cni-arch@uccvma.bitnet,
        www-talk@nxoc01.cern.ch, wais-talk@think.com, iafa@cc.mcgill.ca
In-Reply-To: Tim Berners-Lee's message of Wed, 11 Mar 1992 04:07:28 -0800 <9203111107.AA28084@nxoc01.cern.ch>
Subject: Re: Draft: Universal Document Identifiers
From: Larry Masinter <masinter@parc.xerox.com>
Sender: Larry Masinter <masinter@parc.xerox.com>
Fake-Sender: masinter@parc.xerox.com
Message-Id: <92Mar11.101528pst.101795@poplar.parc.xerox.com>
Date: 	Wed, 11 Mar 1992 10:15:21 PST

Um, I think when I tell you about a document, I can tell you:
a)Some attributes about it that you can remember and use for
  finding it again.
b)its signature/fingerprint/checksum whatever
  This helps you know whether you already have exactly what I'm
  referring to or can get it more locally.
c)some information about where I think you can get it and how
d)a set of instructions you can use for getting it.

So I have a book here. a) It is called "Programming Perl", by Larry
Wall and Randal L. Schwartz. b) it is ISBN 0-937175-64-1. c) it is
part of the O'Reilly & Associatiates series of Unix books, try a
technical library d) If it were available for FTP, it would be in
//ora.com/nuts/books/perl/1991-edition.

The last two are a little strained in the analogy; don't jump on the
analogy, please, I just want to point out that it is reasonable and
customary to supply *MORE THAN ONE* of unique identifier, serial
number, access path, common attributes, etc.








From peterd@expresso.cc.mcgill.ca  Wed Mar 11 19:49:19 1992
Return-Path: <peterd@expresso.cc.mcgill.ca>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA29109; Wed, 11 Mar 92 19:49:19 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA07104; Wed, 11 Mar 92 19:44:15 +0100
Received: from expresso.CC.McGill.CA by kona.cc.mcgill.ca with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b)
        id AA01153  (mail destined for timbl@nxoc01.cern.ch) on Wed, 11 Mar 92 13:28:55 -0500
Received: by expresso.cc.mcgill.ca (NeXT-1.0 (From Sendmail 5.52)/NeXT-1.0)
	id AA02675; Wed, 11 Mar 92 13:28:40 EST
Message-Id: <9203111828.AA02675@expresso.cc.mcgill.ca>
In-Reply-To: Larry Masinter's message as of Mar 11, 10:15
From: peterd@expresso.cc.mcgill.ca (Peter Deutsch)
Date: Wed, 11 Mar 92 18:28:36 GMT-0:02
In-Reply-To: Larry Masinter's message as of Mar 11, 10:15
X-Mailer: Mail User's Shell (6.5.6 6/30/89)
To: Larry Masinter <masinter@parc.xerox.com>, timbl@nxoc01.cern.ch
Subject: Re: Draft: Universal Document Identifiers
Cc: cni-arch@uccvma.bitnet, www-talk@nxoc01.cern.ch, wais-talk@think.com,
        iafa@cc.mcgill.ca

> From masinter@parc.xerox.com Wed Mar 11 13:18:12 1992
.  . .
> Um, I think when I tell you about a document, I can tell you:
> a)Some attributes about it that you can remember and use for
>   finding it again.
> b)its signature/fingerprint/checksum whatever
>   This helps you know whether you already have exactly what I'm
>   referring to or can get it more locally.
> c)some information about where I think you can get it and how
> d)a set of instructions you can use for getting it.
.  .  .
[* analogy deleted *]
> The last two are a little strained in the analogy; don't jump on the
> analogy, please, I just want to point out that it is reasonable and
> customary to supply *MORE THAN ONE* of unique identifier, serial
> number, access path, common attributes, etc.

I have no trouble with this. Obviously, I'm not sure I
need all of these for each query, since I'm trying to
accomplish different things at different times so what I
want back will change but that's just detail.

I suspect that only a) and b) are really permanently
associated with the information and the other two may or
may not be provided by external services. Of course, if
someone's favorite information delivery service wanted to
provide all four, (either directly or through gateways)
that's fine by me.


				- peterd

-- 

From emv@cic.net  Sat Mar 21 23:59:30 1992
Return-Path: <emv@cic.net>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA16035; Sat, 21 Mar 92 23:59:30 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA10793; Sat, 21 Mar 92 23:54:52 +0100
Received: by nic.cic.net (4.1/SMI-4.1)
	id AA01687; Sat, 21 Mar 92 17:54:20 EST
Message-Id: <9203212254.AA01687@nic.cic.net>
To: www-talk@nxoc01.cern.ch
Subject: problem yacc'ing violaWWW
Date: Sat, 21 Mar 92 17:54:19 -0500
From: emv@cic.net

these errors making gram.y:

cc -g -I/usr/X11/include  -I/usr/X11/include/Xaw  -I/usr/X11/include/X11  -I/usr/X11/in
clude/X11/Xmu  -I/usr/X11/include/X11/Xaw  -I/usr/X11/include/Xmu  -Iwww/LineMode/Imple
mentation  -Iwww/Implementation  -target sun4 -c  gram.c
gram.y: 860: syntax error (in preprocessor if)
/usr/lib/yaccpar: 119: syntax error (in preprocessor if)
/usr/lib/yaccpar: 179: syntax error (in preprocessor if)
/usr/lib/yaccpar: 187: syntax error (in preprocessor if)
/usr/lib/yaccpar: 224: syntax error (in preprocessor if)
/usr/lib/yaccpar: 229: syntax error (in preprocessor if)
/usr/lib/yaccpar: 318: syntax error (in preprocessor if)
/usr/lib/yaccpar: 334: syntax error (in preprocessor if)
/usr/lib/yaccpar: 378: syntax error (in preprocessor if)
*** Error code 2
make: Fatal error: Command failed for target `gram.o'

this is on a sun sparcstation. any clues?  i don't grok yacc...

--Ed

From emv@heifetz.msen.com  Sun Mar 22 05:26:34 1992
Return-Path: <emv@heifetz.msen.com>
Received: from cernvax.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA16158; Sun, 22 Mar 92 05:26:34 GMT+0100
Received: by cernvax.cern.ch (5.57/Ultrix2.0-B)
	id AA22231; Sun, 22 Mar 92 05:22:04 +0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA29898; Sun, 22 Mar 92 06:21:52 +0200
Received: by heifetz.msen.com (/\==/\ Smail3.1.22.1 #22.11)
	id <m0lSK12-000HpjC@heifetz.msen.com>; Sat, 21 Mar 92 23:18 EST
Message-Id: <m0lSK12-000HpjC@heifetz.msen.com>
To: www-talk@nxoc01.cern.ch
Subject: behavior of ">" and ">>" in line mode browser
Date: Sat, 21 Mar 92 23:18:39 -0500
From: Edward Vielmetti <emv@msen.com>

I wanted to download a piece of html that I had written (and that Tim
had prettied up) so that I could bring it up to date.  When I did

% www -source http://......NewsGroupRelated.html

it started to show me the file as I would have properly expected to see
it.  But when I did 
	<RETURN> for more, etc: > /tmp/ng.html
the file that appeared was fully formatted (not what I expected or wanted).

i'll try to track down the problem...

--Ed


From emv@heifetz.msen.com  Sun Mar 22 06:10:07 1992
Return-Path: <emv@heifetz.msen.com>
Received: from cernvax.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA16179; Sun, 22 Mar 92 06:10:07 GMT+0100
Received: by cernvax.cern.ch (5.57/Ultrix2.0-B)
	id AA27945; Sun, 22 Mar 92 06:05:36 +0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA00994; Sun, 22 Mar 92 07:05:24 +0200
Received: by heifetz.msen.com (/\==/\ Smail3.1.22.1 #22.11)
	id <m0lSKhB-000HcYC@heifetz.msen.com>; Sun, 22 Mar 92 00:02 EST
Message-Id: <m0lSKhB-000HcYC@heifetz.msen.com>
To: timbl@nxoc01.cern.ch
Cc: www-talk@nxoc01.cern.ch
Subject: "Cannot connect to information gateway %s\n", gateway
Date: Sun, 22 Mar 92 00:02:11 -0500
From: Edward Vielmetti <emv@msen.com>

(that's from HTAccess.c)

I want to be able to make references like
	   www rfc:934
	or www rfc:934.txt
and have that turned into
	www file://ftp.nisc.sri.com/rfc/rfc934.txt

what should I set "WWW_rfc_GATEWAY" to to do something like that?

it looks like the code in HTAccess.c is only set up to use HTTP as
a gateway scheme, which is fair and good, though I would expect that
if I say
	setenv WWW_rfc_GATEWAY file://ftp.nisc.sri.com/rfc/
that it would look and say "aha, a file, better FTP it".

this is with the 1.2c version of the line mode browser.

thanks,

--Ed

From HARMO@valt.helsinki.fi  Sun Mar 22 09:55:31 1992
Return-Path: <HARMO@valt.helsinki.fi>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA16284; Sun, 22 Mar 92 09:55:31 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA06249; Sun, 22 Mar 92 10:50:53 +0200
Received: from charon-gw.pc.Helsinki.FI by kruuna.helsinki.fi with SMTP id AA10617
  (5.65c/IDA-1.4.4 for <www-talk@nxoc01.cern.ch>); Sun, 22 Mar 1992 10:50:19 +0200
Received: From HYLKN1/WORKQUEUE by charon-gw.pc.Helsinki.FI
          via Charon 3.4 with IPX id 100.920322105033.256;
          22 Mar 92 10:51:56 +0200
Message-Id: <MAILQUEUE-101.920322104940.179@valt.Helsinki.FI>
To: www-talk@nxoc01.cern.ch
From: "Timo Harmo - SocSci U of Helsinki"  <HARMO@valt.helsinki.fi>
Date:     22 Mar 92 10:49:40 EET
Subject:  Graphical browsers in hypertext
X-Pmrqc:  1
X-Mailer: Pegasus Mail v2.2 (R3).

Is there, or is there planned, some kind of standard for
presenting graphical browsers in WWW?
I think it would be great to have maps of the hyperterritories one is
about to explore. And it could be quite simple, too. The map could be
just a list of links with coordinates (and maybe some formatting
info?), line-mode clients could ignore the coordinates and present
the browsers as lists.
 -Timo

From emv@heifetz.msen.com  Mon Mar 23 00:56:45 1992
Return-Path: <emv@heifetz.msen.com>
Received: from cernvax.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA16785; Mon, 23 Mar 92 00:56:45 GMT+0100
Received: by cernvax.cern.ch (5.57/Ultrix2.0-B)
	id AA22856; Mon, 23 Mar 92 00:52:15 +0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA27154; Mon, 23 Mar 92 01:51:59 +0200
Received: by heifetz.msen.com (/\==/\ Smail3.1.22.1 #22.11)
	id <m0lScHN-000HntC@heifetz.msen.com>; Sun, 22 Mar 92 18:48 EST
Message-Id: <m0lScHN-000HntC@heifetz.msen.com>
To: www-talk@nxoc01.cern.ch
Cc: timbl@nxoc01.cern.ch
Subject: WAISGate support for WAIS 'HTML' doc type
Date: Sun, 22 Mar 92 18:48:43 -0500
From: Edward Vielmetti <emv@msen.com>

In WAISGate.c (part of the WAIS/WWW gateway in the Daemon distribution)
there's a little hunk of code that says basically

	pick up the doctype of the document
	if it's WSRC (a WAIS source),
		it's an index description file,
		parse the WSRC
	else
		it's a plain text document,
		send it as a straight text file

It strikes me as a reasonable idea to extend this to put in another
case that says
	if it's HTML
		it's an HTML file,
		send it with no extra parsing

If can agree that this is a reasonable thing to do, then I'll make
the necessary changes to "waisindex" to index up the html files I've
written to date and serve them up.  

--
Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com
      MSEN Inc., 628 Brooks, Ann Arbor MI  48103 +1 313 741 1120
"Not to panic.  Networking will eventually get to Michigan.  Remember
 the decades it took IBM to discover virtual memory."  Randy Bush


From timbl  Mon Mar 23 16:30:33 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA17902; Mon, 23 Mar 92 16:30:33 GMT+0100
Date: Mon, 23 Mar 92 16:30:33 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203231530.AA17902@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: "Timo Harmo - SocSci U of Helsinki" <HARMO@valt.helsinki.fi>
Subject: Re: Graphical browsers in hypertext
Cc: www-talk@nxoc01.cern.ch

> From: "Timo Harmo - SocSci U of Helsinki"  <HARMO@valt.helsinki.fi>
> Date:     22 Mar 92 10:49:40 EET
>
> Is there, or is there planned, some kind of standard for
> presenting graphical browsers in WWW?
> I think it would be great to have maps of the hyperterritories one is
> about to explore. And it could be quite simple, too. The map could be
> just a list of links with coordinates (and maybe some formatting
> info?), line-mode clients could ignore the coordinates and present
> the browsers as lists.
>  -Timo

There are three possibilities here.

One is a general graphics browser -- that is, instead of being limited to
hypertext, go for hypergraphics. This would mean picking a graphics
standard and adding an anchor representation to add to it. As you say, a
line mode browser could just list the links from a graphics node.

Another is building a graphical map of part of the web. This is a good way
to navigate, but it is quite a challenge to decide which nodes and links
to put in and leave out, and where to put the nodes. Bear in mind that some
nodes have just a few links, some have hundreds. Trying to get the most into a
window and at the same time make it look natural is an interesting problem.
If it was computationally intensive it could be done off-line.

The third is combining the two above by making a "map" window for an existing
browser. This could serve the "History" function of showing where a user
has been, but with links off to other nodes too.  As most people seem to prefer
to think of the data they browse as a tree, one could start by representing the
paths the user took as a tree, and then put in cross-links and links to
other referenced nodes.

Yes, its a great idea -- anyone want to implement it?   :-)

	Tim


From timbl  Mon Mar 23 17:32:29 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA18049; Mon, 23 Mar 92 17:32:29 GMT+0100
Date: Mon, 23 Mar 92 17:32:29 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203231632.AA18049@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Edward Vielmetti <emv@msen.com>
Subject: Re: WAISGate support for WAIS 'HTML' doc type
Cc: www-talk@nxoc01.cern.ch

Ed,

Great.  I have put in the mod into the WAIS gateway running here. -- I have no
HTML WAIS index to try it on though, so let me know when you do.

BTW, are you using Simon Spero's filter mods to filter out control information
(and weight title information and headings)?

Tim

From jfg@bernd.cern.ch  Mon Mar 23 17:53:17 1992
Return-Path: <jfg@bernd.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA18099; Mon, 23 Mar 92 17:53:17 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA08402; Mon, 23 Mar 92 17:48:43 +0100
Received: by bernd.cern.ch (AIX 3.1/UCB 5.61/4.03)
          id AA15238; Mon, 23 Mar 92 17:47:01 -2300
Date: Mon, 23 Mar 92 17:47:01 -2300
From: jfg@bernd.cern.ch (Jean-Francois Groff)
Message-Id: <9203241647.AA15238@bernd.cern.ch>
To: www-talk@nxoc01.cern.ch
Subject: Re: behavior of ">" and ">>" in line mode browser
References: <m0lSK12-000HpjC@heifetz.msen.com>

  Thank you Ed for more useful bug reports. Here are your answers...

----------------------------------------------------------------------------
problem yacc'ing violaWWW

I don't know. I never compiled it myself... The error is not in gram.y...
Looks like your cc doesn't understand the "#if" preprocessor directive (ANSI).
Try gcc instead.

----------------------------------------------------------------------------
behavior of ">" and ">>" in line mode browser

Here's a diff on HTBrowse.c to handle this. The line numbers and the
"the_choice" variable will not correspond to your code because they're
from the unreleased 1.2e (soon to become 1.3 on ftp...)

*** 1080,1087 ****
  
                command  = (char *) malloc(
                        strlen(address)+strlen(the_choice)+30);
!               sprintf(command, 
!                       "www -n -na -p \"%s\" %s", address, the_choice);
                result = system(command);
                if (result) printf("  %s  returns %d\n", command, result);
                free(command);
--- 1080,1087 ----
  
                command  = (char *) malloc(
                        strlen(address)+strlen(the_choice)+30);
!               sprintf(command, "www -n %s \"%s\" %s", 
!                       HTDiag ? "-source" : "-na -p", address, the_choice);
                result = system(command);
                if (result) printf("  %s  returns %d\n", command, result);
                free(command);

----------------------------------------------------------------------------
rfc: "gateway"

I see your point. However, the only intention of the WWW_foo_GATEWAY syntax
is to enable access to protocols not understood by the www client by means
of a gateway that translates those to/from HTTP/HTML. Ideally, what we need
to be able to fetch RFCs properly is a naming service, x500 or whatever.
Your rfc: access is rather an address alias. For that, you could direct it
to your own HTTP server with WWW_rfc_GATEWAY, and then insert

	map	rfc:	file://ftp.nisc.sri.com/rfc/

in your httpd.conf rule file. But beware that if you start serving HTML
files with rfc: addresses instead of file://host/rfc, every www client in
the world will have to set WWW_rfc_GATEWAY.

----------------------------------------------------------------------------
WAISGate support for WAIS 'HTML' doc type

Tim answered this one. As he said, "Great!".

----------------------------------------------------------------------------
  Jean-Francois Groff (jfg@info.cern.ch)
  World-Wide Web initiative
  CERN, ECP division, CH-1211 Geneva 23, Switzerland
  Phone +41 22 767 3755 -- Fax +41 22 767 7155
--
"If we were directed from Washington when to sow and when to reap,
 we would soon want bread."
	- Thomas Jefferson


From timbl  Tue Mar 24 11:23:31 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA19187; Tue, 24 Mar 92 11:23:31 GMT+0100
Date: Tue, 24 Mar 92 11:23:31 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203241023.AA19187@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: jfg@bernd.cern.ch (Jean-Francois Groff)
Subject: Mapping names in the server
Cc: www-talk@nxoc01.cern.ch

Ed,

As JF says, you can  map logical document names to real file names in the server by  
means of the rule file. We didn't put the rule file into the client,
because we didn't want the start-up time to be long. Also,
we wanted a W3 document name to be followable by anyone, not just
the few who have picked up the latest gems of rule file titbits.

(There were a couple of starts missing from JF's example, which should have been

	map	rfc:*	file://ftp.nisc.sri.com/rfc/*

)

Actually the current distributed server expects the output of the rule file
to be a local filename. You would have to modify it to expect a w3 address.

In the long term, a name service should exist to translate

	name-service:/org/isoc/rfc/959

into say file://ftp.nisc.sri.com/rfc/rfc959.ps and .txt.  As noone seems to
want to extend the DNS, it looks as though x500 is the best bet. There was
some discussion on this at the IETF, though it'll be a while before x500 is all  
over.

Tim

From timbl  Thu Mar 26 15:25:12 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA23337; Thu, 26 Mar 92 15:25:12 GMT+0100
Date: Thu, 26 Mar 92 15:25:12 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203261425.AA23337@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Jonny Goldman <jonathan@think.com>
Subject: Openning the WAIS document-id syntax
Cc: www-talk@nxoc01.cern.ch, wais-talk@think.com

> Date: Tue, 24 Mar 92 09:46:21 PST
> From: Jonny Goldman <jonathan@think.com>

Jonny,

This is relevant to the WAIS-FTP work Jim is doing.

Unfortunately none of the WAIS crowd could get to discussions at the IETF -- though  
John Curran represented the WAIS side. Those discussions were very interesting. 


The data model of WAIS (documents in databases) could be deconstrained to allow  
documents themselves to be or contain lists of documents, and for lists of  
documents to point to things other than documents in the same database.

This is the way the second part can work.  Normally, a search returns a list of  
doc-ids, each one (basically) like

	/usr/local/lib/wais/mydatabase/fred/myfile.txt

which is in fact a filename. There's a load of other stuff in there which we can  
ignore for now.  What a WAIS search needs to be able to do, when you are pointing
to files, is to return a pointer to a file in FTP say. We do that in two steps.
First, we recognise that that id is local to the conext of a wais server on host  
myhost and port myport. When the server returns that string, the client
uses knowledge of the context in which it was quoted to exapnd that to

	wais://myhost.dom.net:myport/usr/local/lib/wais/mydatabase/fred/myfile.txt

This is a refernece you can quote to anyone as it makes sense anywhere. No context.
I called it a UDI but we'll have to change the name. Document Access Token maybe.
It's like Brewster's proposal but extendable to other protocols.  [Yes, WAIS is a  
good protocol but there are others. Including name servers and directories which  
will be needed for long-lived but movable documents.]

Now suppose one day a server returns a doc-id INCLUDING the protocol, host, etc.  
For example, your WAIS FTP engine (like the ARCHIE WAIS) returns what are basically
pointers to files. Just now, because of the constraints of the model, it has to  
return a part of a file within the database. Suppose we change that, so that
in your case it just returns a doc-id which specifies anonymous ftp access, like:

	file://otherhost.com/pub/doc/mydoc.txt

The client has a general retrieval engine which can accept doc-ids in many domains  
-- not just WAIS. That allows it to go out over a different protocol to retrieve  
the object.

This is the way WWW and Gopher work.  They are open systems -- you can link into
any other system within reason.  That's why the fuss about universal document  
identifiers.  Maybe the WAIS people would to incorporate them -- that is, just
make sure that the normal WAIS server return things which are -- like the one
above -- special cases of the more general syntax.

I haven't had much comment from the WAIS side about the UDIs, but I'd like to have  
some. (file://info.cern.ch/pub/www/doc/udi1.ps was background for the IETF  
discussions.) We plan a small working group hacking out the details before an RFC  
is submitted.


> I like the idea of generalized interfaces, customized servers.

You bet!


- Tim BL


From jonathan@quake.think.com  Thu Mar 26 18:53:00 1992
Return-Path: <jonathan@quake.think.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA23811; Thu, 26 Mar 92 18:53:00 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA06767; Thu, 26 Mar 92 18:48:28 +0100
Received: from philo.quake.think.com. by quake.think.com (4.1/SMI-4.0)
	id AA00347; Thu, 26 Mar 92 09:48:02 PST
Received: by philo.quake.think.com. (4.1/SMI-4.1)
	id AA00262; Thu, 26 Mar 92 09:47:34 PST
Date: Thu, 26 Mar 92 09:47:34 PST
Message-Id: <9203261747.AA00262@philo.quake.think.com.>
From: Jonny Goldman <jonathan@think.com>
Sender: jonathan@quake.think.com
To: timbl@nxoc01.cern.ch
Cc: www-talk@nxoc01.cern.ch, wais-talk@think.com
In-Reply-To: Tim Berners-Lee's message of Thu, 26 Mar 92 15:25:12 GMT+0100 <9203261425.AA23337@ nxoc01.cern.ch >
Subject: Openning the WAIS document-id syntax

First, I'd like to point out the WAIS-FTP doesn't mean a client or server
understands FTP protocol.  It's simply a customized server that functions
like FTP (but is read-only).  It's mainly an experiment in modifying
servers and providing services under WAIS.

   Date: Thu, 26 Mar 92 15:25:12 GMT+0100
   From: timbl@nxoc01.cern.ch (Tim Berners-Lee)

   [...]

   The data model of WAIS (documents in databases) could be deconstrained
   to allow documents themselves to be or contain lists of documents, and
   for lists of documents to point to things other than documents in the
   same database.

I take it you're suggesting a new TYPE for a document: Derived types?  In a
sense the catalog is one of these.

   This is the way the second part can work.  Normally, a search returns a
   list of doc-ids, each one (basically) like

	   /usr/local/lib/wais/mydatabase/fred/myfile.txt

   which is in fact a filename.

Let me also point out that this is just the method used in the sample
server.  The CM server does not return DocID's that are derived from
filenames.

In fact, DocID's are "any"s, and that means they can have anything in them,
so long as the server understands how to return a specified amount of data
to a client when presented a DocID and a range.

   There's a load of other stuff in there which we can ignore for now.
   What a WAIS search needs to be able to do, when you are pointing to
   files, is to return a pointer to a file in FTP say. We do that in two
   steps.

I don't agree.  I think the server should do the retrieval.  The client
should not have to know anything about the REAL location of the document.
More on that below.

   First, we recognise that that id is local to the conext of a wais server
   on host myhost and port myport. When the server returns that string, the
   client uses knowledge of the context in which it was quoted to exapnd
   that to

	   wais://myhost.dom.net:myport/usr/local/lib/wais/mydatabase/fred/myfile.txt

   This is a refernece you can quote to anyone as it makes sense anywhere.
   No context.  I called it a UDI but we'll have to change the name.
   Document Access Token maybe.  It's like Brewster's proposal but
   extendable to other protocols.  [Yes, WAIS is a good protocol but there
   are others. Including name servers and directories which will be needed
   for long-lived but movable documents.]

This is a good idea, but I feel rather strongly that we should be very
careful in overloading the protocol.  Specifying a syntax for DocID's is
one way of overloading the protocol.  Standardizing types is another.

   Now suppose one day a server returns a doc-id INCLUDING the protocol,
   host, etc.  For example, your WAIS FTP engine (like the ARCHIE WAIS)
   returns what are basically pointers to files. Just now, because of the
   constraints of the model, it has to return a part of a file within the
   database. Suppose we change that, so that in your case it just returns a
   doc-id which specifies anonymous ftp access, like:

WAIS-FTP doesn't return pointers to remote files.  It returns local DocIDs
for use in retrieving a file local to the server.  Archie WAIS (and
ftpable-readmes) returns these pointers.  That's a different story.

Now for a small discussion of WAIS DocID's. So far WAIS DocID's have only a
few fields:

typedef struct DocID{
   any* originalServer;
   any* originalDatabase;
   any* originalLocalID;
   any* distributorServer;
   any* distributorDatabase;
   any* distributorLocalID;
   long copyrightDisposition;
} DocID;

The part you refer to is just the LocalID part.  If you look at some of the
DocID's returned by the serial server, you'll see the other fields are
filled in (though the Server fields don't contain much useful information -
it's that part we were trying to standardize with the doc-id proposal).

	   file://otherhost.com/pub/doc/mydoc.txt

   The client has a general retrieval engine which can accept doc-ids in
   many domains -- not just WAIS. That allows it to go out over a different
   protocol to retrieve the object.

There are two ways to handle this, of course.  Either the client or the
server could do the retrieval.  I believe the server should handle the
protocol part (if the document is stored on some FTP server somewhere, the
WAIS server can just fetch the file, and return it to the client).  This
reduces client complexity.  I have no objection to specifying the
protocol/server in the DocID (perhaps with another field), but we must
standardize the meanings.

   This is the way WWW and Gopher work.  They are open systems -- you can
   link into any other system within reason.  That's why the fuss about
   universal document identifiers.  Maybe the WAIS people would to
   incorporate them -- that is, just make sure that the normal WAIS server
   return things which are -- like the one above -- special cases of the
   more general syntax.

   I haven't had much comment from the WAIS side about the UDIs, but I'd
   like to have some. (file://info.cern.ch/pub/www/doc/udi1.ps was
   background for the IETF discussions.) We plan a small working group
   hacking out the details before an RFC is submitted.

Come up with an RFC, and we'll try to abide by it.  I'd like to caution you
against overloaded strings.  We've got enough of them already.

For a start, I'd suggest we use the originalServer as the identifier for
the HOST, and the originalDatabase can inform us of the protocol.

- Jonny G

From timbl  Fri Mar 27 09:58:11 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA24800; Fri, 27 Mar 92 09:58:11 GMT+0100
Date: Fri, 27 Mar 92 09:58:11 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203270858.AA24800@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Edward Vielmetti <emv@msen.com>
Subject: Re: more WWW/WAIS gateway hacks
Cc: www-talk@nxoc01.cern.ch

From: Edward Vielmetti <emv@msen.com>
Date: Fri, 27 Mar 92 00:07:40 -0500

(I'm going to have to get the WWW gateway up and running myself.)

	Sound like a good idea, Ed!  Especially no I've released it
	"properly" :-)

Here's a thought to make WWW/WAIS gateways for netnews even more
useful than they already are.

Some WAIS db's are really db's full of news articles.  comp.archives
is one, there's collections of current comp.sys.* and sci.* groups,
etc etc.

The WWW approach to news articles is really cool cause it builds links
based on embedded message ID's, links on the newsgroups: line, stuff
like that.

If you could somehow note (maybe on a per-gateway basis?  it might
not be possible to discern on the fly, so you'd perhaps configure
it ahead of time) that the texts coming back from a WAIS server were
really news articles instead of arbitrary globs of texts, and theny
you added the set of news-parsing stuff, you'd have a big win on your
hands.

(eg. try
	<a href=wais://wais.oit.unc.edu:210/comp-sys?jpeg> jpeg in comp.sys </a>
and explain why you can't chain back to things from there.)

will get the code and see what I can do to add this.

--Ed

	Two ideas here. One is to add a "MailNews" RFC850/822 message type
	which switches on client message parsing.  The other is to have a
	converter in the gateway which offes the message as TEXT or as HTML.
	For those clients which have HTML, they pick up the hypertext
	version. The first seems easier. Can we use the same type for mail
	and news messages?

	Another idea. Suppose you run a server on the NNTP machine, so it
	has direct access to the news, and when news comes in you cross-link
	the references, so that you can follow links to replies to messages
	references BY a message as well as the messages it references.

	Maybe this could be done on the fly when the message is read.

	-Tim


pls pass on to the whole www crew....

ps.  you interested in starting a "comp.infosystems.web" group?
     should get a lot of attention.  my WAIS newgroup vote has
     150+ yes votes in the first two days !

	Sure, if you think it'll fly. I guess we have to gateway the mail
	for a while. Maybe comp.infosystems.www --- there is the "web and Cweb"
	toolset from Knuth (?) which we don't want to get confused with.


From timbl  Fri Mar 27 11:45:43 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA25109; Fri, 27 Mar 92 11:45:43 GMT+0100
Date: Fri, 27 Mar 92 11:45:43 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203271045.AA25109@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Linda Murphy <murphy@dairp.upenn.edu>
Subject: WWW NeXT client, telnet feature; Web traversal
Cc: www-talk@nxoc01.cern.ch

> Date: Thu, 26 Mar 92 18:06:51 GMT-0500
> From: Linda Murphy <murphy@dairp.upenn.edu>

> My environment: a NeXT.
> ...I am running pre-release b of version 0.13 of the WWW application.
> I get ... Invalid access prefix for 'telnet://info.cern.ch'
> Can be one of news: but not 'telnet:'.


The NeXT client was frozen last summer, so it has no innovations which occured  
since then. I have it relatively high on my (rather large) agenda to bring it  
up-to-date from a prototype to  include the re-engineerd common library, and  
rerelease it.  (I was under pressure here not to put too much work into it because  
NeXTs were not official CERN workstations for a while: everyone wanted X. Now X is  
coming from other people, the pressure should ease.)

Another problem you will find is that the NeXT client can't cope with the very long  
identifiers returned by the latest WAIS servers such as the directory of servers.  
It just crashes becuase I put in an arbitrary hard limit (bad! :-().

Apart from telnet: it also can't handle gopher:.

I'm really sorry for that lack of functionality.  I use the app all the time myself  
so it bugs me too. I'll do it when I have time, but better server functionality I  
think should come first.  Someone in Hawaii whose mail address I don't seem to have  
(are you out there?) thought they might find six person-months to put into the NeXT  
app which would be great,

> Further down in the bug list, you mention serialisation and web
> traversal.  What do you mean by this term?
> 

> --lam

I mean a feature to turn a web into a serial document, like to print it on paper,  
by traversing the web. This is really needed -- the world is looking for ways of  
turning text into hypertext, but the moment you do it, you want to turn it back  
again for people who want paper! A traversal, and concatenation followed by a sed  
file to turn it into TeX macros should cover it.

Also one imagines tools which traverse the web recursively in a breadth-first way  
looking for things -- interesting data and indexes for example. I image terminating  
the search as a function of number of links traversed modified by the  
"interestingness" of documents found on the way (judged by the words the contain  
matched against a query).  This is a step toward a "knowbot"-like tool for resource  
discovery. Now we have a real web to play with, we can start making such machines  
in earnest.

	Tim



From timbl  Fri Mar 27 12:37:47 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA25222; Fri, 27 Mar 92 12:37:47 GMT+0100
Date: Fri, 27 Mar 92 12:37:47 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203271137.AA25222@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: pflynn@curia.ucc.ie (Peter Flynn)
Subject: Re: CURIA - WWW server for Irish manuscripts coming soon
Cc: www-talk@nxoc01.cern.ch

> Date: Thu, 26 Mar 92 21:38:55 GMT
> From: pflynn@curia.ucc.ie (Peter Flynn)

> OK, I still have to get the daemon running, but I should have that next
> week. You can then add us to the list of stuff available in W3. Details are:

<DT><a name=??? href=http:curia.ucc.ie//usr/local/lib/WWW/CURIA_menu.html>The
CURIA Project
<DD>Browseable Irish manuscripts from the Royal Irish Academy and University
College Cork.

	(BTW You can omit name==??? -- you don't need a name unless you
	want to refer TO the anchor.)

	I think you mean http://curia.ucc.ie/usr/local/lib/WWW/CURIA_menu.html
	the souble slash introduced the host.

We are still waiting reverse IP registration, so currently I guess we are
only accessible as 143.239.1.8

	Ok, http://143.239.1.8/usr/local/lib/WWW/CURIA_menu.html
	will have to do for now. Let me know when it's up. A few tips.

	Please run the server on port 80 as new software will default
	to this official IANA port number. In the mean time quote it
	as http://143.239.1.8:80/usr/local/lib/WWW/CURIA_menu.html
	to be safe.

	In the configuration file, you can put a map line
	  map / /usr/local/lib/WWW/CURIA_menu.html
	so that someone accessing http://143.239.1.8:80/ will
	get something useful about the site. You have to make
	all the refernces in that document absolute, as it will
	appear in two places in the web.

At the moment there's just one or two pages of test stuff, but this is
defined in a status report which I will keep up to date as part of the
documentation. We have nearly finished scanning vol 1 of the _Annals of
the Four Masters_ and the _Chronicon Scotorum_ so we'll be editing them into
shape in the next month or so. As I won't be at JENC, I want to try hard
to get a useable chapter or two up and browseable by then.

	Sounds great!  Admitedly the Irish might be understandable by
	a limite audience, but it will be available to gaelic
	academia worldwide.  (Are you putting up parallel translations?)

I fudged the five lowercase accent-aigu vowels into HTML.c as you suggested.

	Great -- can you send me the list and I'll incorporate it
	into all the sources.  We should put in all the European set
	of characters in fact.

It works fine for the sun-cmd X shell window, but when I replaced them with
the five char codes for the IBM PC, and then access it over telne from
my PC logged into my VAX, something somewhere en route is mapping the
8-bit chars into 7-bit, so an acute-a (decimal 160) ends up as a space
and an acute-e as a double-quote (whoops, sorry, acute-a comes out as
an @-sign). This is not WWW code, but either the sun terminfo/termcap
being intrusive or something in the comms side. It's going to be a major
headache to get it sorted...all help welcome.

	If someone en route is killing the 8th bit, then you are stuck.
	The best thing then seems to be to run WWW on the PC directly.
	In fact some of the PC graphic characters are in non-graphic
	psoitions of the table (0X, 1X, 8X, 9X hex), so telnet is
	likely to have trouble anyway.
	There's a port done for SUN/NFS: what type of TCP/IP for
	the PC do you use?

Have a nice weekend!
///Peter

	You too.
	- Tim


From emv@heifetz.msen.com  Fri Mar 27 15:34:58 1992
Return-Path: <emv@heifetz.msen.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA25727; Fri, 27 Mar 92 15:34:58 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA02049; Fri, 27 Mar 92 15:30:29 +0100
Received: by heifetz.msen.com (/\==/\ Smail3.1.22.1 #22.11)
	id <m0lUHtT-000HqgC@heifetz.msen.com>; Fri, 27 Mar 92 09:26 EST
Message-Id: <m0lUHtT-000HqgC@heifetz.msen.com>
To: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Cc: pflynn@curia.ucc.ie (Peter Flynn), www-talk@nxoc01.cern.ch
Subject: Re: CURIA - WWW server for Irish manuscripts coming soon 
In-Reply-To: Your message of Fri, 27 Mar 92 12:37:47 +0100.
             <9203271137.AA25222@ nxoc01.cern.ch > 
Date: Fri, 27 Mar 92 09:26:55 -0500
From: Edward Vielmetti <emv@msen.com>

w/r/t encoding of non-Latin characters, I'd recommend at least looking
at the work of the "Text Encoding Initiative" (the TEI) - they're
a scholarly bunch who have standards for how to do SGML encoding
of scholarly texts esp. including representation of glyphs from languages
other than English.

as to a reference, hm.  so far as I know none of the discussions
are WAISed yet, the best bet is the listserv at UICVM.UIC.EDU.

<title> tei - text encoding initiative </title>

<p>
michael sperberg-mcqueen at uic
some !@#$ listserv somewhere, let's see
<a href=wais://wais.cic.net/lists?tei> look in lists </a>
<p>

--Ed

From emv@heifetz.msen.com  Mon Mar 30 16:37:34 1992
Return-Path: <emv@heifetz.msen.com>
Received: from cernvax.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA00108; Mon, 30 Mar 92 16:37:34 GMT+0100
Received: by cernvax.cern.ch (5.57/Ultrix2.0-B)
	id AA27949; Mon, 30 Mar 92 17:32:52 +0200
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA07840; Mon, 30 Mar 92 17:32:40 +0200
Received: by heifetz.msen.com (/\==/\ Smail3.1.22.1 #22.11)
	id <m0lVOIS-000Hp1C@heifetz.msen.com>; Mon, 30 Mar 92 10:29 EST
Message-Id: <m0lVOIS-000Hp1C@heifetz.msen.com>
To: www-talk@nxoc01.cern.ch
Subject: bad doc-id via WAIS/WWW gateway
Date: Mon, 30 Mar 92 10:29:20 -0500
From: Edward Vielmetti <emv@msen.com>

For some WWW searches on WAIS databases, I get an error response.  
My hunch is that there are statically sized buffers hiding somewhere
in the system and that absurdly long WAIS document-id's are overflowing
them.

--Ed



Index comp-sys contains the following 1 item relevant to ''.
<dl>
Code: SF, Bad DocID in request
</DL>
<h2>No text was returned!</h2>



From emv@heifetz.msen.com  Tue Mar 31 07:02:32 1992
Return-Path: <emv@heifetz.msen.com>
Received: from cernvax.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA00954; Tue, 31 Mar 92 07:02:32 GMT+0100
Received: by cernvax.cern.ch (5.57/Ultrix2.0-B)
	id AA24304; Tue, 31 Mar 92 07:57:52 +0200
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA20113; Tue, 31 Mar 92 06:43:10 +0200
Received: by heifetz.msen.com (/\==/\ Smail3.1.22.1 #22.11)
	id <m0lVadT-000HtvC@heifetz.msen.com>; Mon, 30 Mar 92 23:39 EST
Message-Id: <m0lVadT-000HtvC@heifetz.msen.com>
To: www-talk@nxoc01.cern.ch
Cc: cddev@sterling.com
Subject: Changing NNTP servers on the fly.
Date: Mon, 30 Mar 92 23:39:50 -0500
From: Edward Vielmetti <emv@msen.com>

I have a CD-ROM (Netnews/CD from Sterling Software) that has a big
pile of news in it -- and I'd like to support reasonable access to
it via WWW.  However; the current support for NNTP reading in WWW
assumes that there's a single NNTP server through which all
"news:" access will go, so it's not straightforward to support
a second system which might have an alternative access to a news spool.

I'd suggest extending the syntax to read as follows
	news:comp.sys.foo		comp.sys.foo on default system
	news://nntp.archive.msen.com/comp.sys.foo       nntp.archive.msen.com
	news://nntp.archive.msen.com:1990/comp.sys.foo  (on port 1990)
with similar extensions for referencing individual articles.

--Ed

From timbl  Tue Mar 31 09:17:36 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA01065; Tue, 31 Mar 92 09:17:36 GMT+0200
Date: Tue, 31 Mar 92 09:17:36 GMT+0200
From: timbl (Tim Berners-Lee)
Message-Id: <9203310717.AA01065@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Edward Vielmetti <emv@msen.com>
Subject: Re: Changing NNTP servers on the fly.
Cc: www-talk@nxoc01.cern.ch, cddev@sterling.com, lear@oni.sgi.com (Eliot Lear)


> I have a CD-ROM (Netnews/CD from Sterling Software) that has a big
> pile of news in it -- and I'd like to support reasonable access to
> it via WWW.  However; the current support for NNTP reading in WWW
> assumes that there's a single NNTP server through which all
> "news:" access will go, so it's not straightforward to support
> a second system which might have an alternative access to a news spool.
> 

> I'd suggest extending the syntax to read as follows
> 	news:comp.sys.foo		comp.sys.foo on default system
> 	news://nntp.archive.msen.com/comp.sys.foo       nntp.archive.msen.com
> 	news://nntp.archive.msen.com:1990/comp.sys.foo  (on port 1990)
> with similar extensions for referencing individual articles.
>
> - Ed

The syntax would certianly fit in with the UDI format -- however, are the semantics
well defined?  In general, the whole point of news is that it is held locally,
avoiding millions of WAN accesses.

If you put in a node name, then you are changing the way the protocol works  
altogether.  This might be conveient, but its not really "news". You're saying
that we can use NNTP as a file retrieval protocol. Obviously, once you have given  
out a reference like that above you have to make nntp.archive.msen.com available  
for everyone in the world.

Maybe this is a way to solve the news archive retrieval problem. It isn't  done
that way at the moment of course: Its a big headache right now. Messages are tared  
and compressed and put on some machine under the date of the message -- you have to  
know which newsgroup the message was sent to, and then look the archive hostname up  
in a list which doesn't (correct me?) exist. (Another use for the X-500 directory?)

It doesn't fit very well into the news model, all the same. For example, when you  
find a reference to another newsgroup/article on your CD rom, There's no way of  
knowing whether you should look it up on the CD rom or on a "live" news server.
We really need some NNTP extensions to insist that the message-id can carry some  
hints as to where it might be archived -- unfortunately I missed the NNTP session  
at the IETF but I know that Eliot Lear (ietf nntp wg chair) for example is thinking  
about such problems, and indeed from talking to him I got the impression that the  
NNTP group's discussions were overflowing into the retrieval and resource discovery  
areas.

It's not _trivial_ to put it in the code for (One will have to keep a cache of  
network connections to hosts) so I won't do it now.  I'd point out you could set up  
an HTTP server to serve the news data, although it would have (currently) to  
convert the news format into HTML which it (currently) doesn't (yet) do (yet).

Tim

--Ed


From lear@yeager.corp.sgi.com  Tue Mar 31 18:48:05 1992
Return-Path: <lear@yeager.corp.sgi.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA02572; Tue, 31 Mar 92 18:48:05 GMT+0100
Received: by dxmint.cern.ch (cernvax) (5.57/3.14)
	id AA08030; Tue, 31 Mar 92 19:43:27 +0200
Received: from relay.sgi.com by sgi.sgi.com via SMTP (911016.SGI/910110.SGI)
	for www-talk@nxoc01.cern.ch id AA05925; Tue, 31 Mar 92 09:42:46 -0800
Received: from yeager.corp.sgi.com by relay.sgi.com via SMTP (911016.SGI/911001.SGI)
	for @sgi.sgi.com:timbl@nxoc01.cern.ch id AA10133; Tue, 31 Mar 92 09:42:44 -0800
Received: by yeager.corp.sgi.com (911016.SGI/911001.SGI)
	for @sgi.com:cddev@sterling.com id AA14310; Tue, 31 Mar 92 09:42:43 -0800
Date: Tue, 31 Mar 92 9:42:42 PST
From: Eliot Lear <lear@yeager.corp.sgi.com>
To: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Cc: Edward Vielmetti <emv@msen.com>, www-talk@nxoc01.cern.ch,
        cddev@sterling.com
Subject: Re: Changing NNTP servers on the fly.
In-Reply-To: Your message of Tue, 31 Mar 92 09:17:36 GMT+0200
Message-Id: <CMM.0.90.2.702063762.lear@yeager.corp.sgi.com>


Hello all,

It is true that the nntp working group has been pushing against all
sorts of retrieval issues.  How any of the following would be
implemented is completely an open question, right now.  I should say
that much of what follows was the result of informal brainstorming,
and a lot of discussion at various USENIXes.  I think everyone agrees
that the NNTP people do not yet have enough information to make a
decision, and there is a growing concern about scope of whatever
project we would choose to take on, as one could quickly envision a
very broad all-encompassing project that would serve everyone's needs
but never be implemented.  As we begin to discuss best ways to present
news to the user, we immediately come up against five questions.
Briefly described, they are the following:

[1]	How shall the user select and receive new information?
	Are we talking SQL or Z.39 or what?

[2]	Should the mechanism be a pull-update/lockstep mechanism, as
	it is now, or does the server need to have enough smarts about
	things like priorities such that the mechanism should be
	async/interrupt driven?

[3]	Should we be writing the protocol with some sort of RPC
	mechanism in mind, such that the application doesn't even know
	if the service is local?

[4]	How do we handle archives?  Should a saved article be treated
	just as any other article, or do we need stronger archive
	search mechanisms in NNTP?  OR, should archive support be
	placed in the netnews model, itself (e.g., sendme style
	retrieval)?
	OR, should netnews reading become a distributed model, as
	access to the Internet approaches ubiquity?  Here is where
	we begin to delve into resource and information location
	issues.

[5]	Should whatever mechanism we design be limited to netnews, or
	should we also leave enough rope for someone to use it for
	mail?

So what we have right now is a growing list of questions, and not very
many answers - YET.

I must clarify one point Tim made.  News is currently stored and read
locally mostly for historical reasons.  The plain fact of the matter
is that netnews has been and continues to be more popular than the
Internet, simply because it costs less.  Thus in past people have not
considered reading over the Internet as ``the mechanism'' because it
could not be used as such by a large portion of the participants.
There is also an issue of how to find new and interesting articles
under a distributed model.  That's an area I haven't given much
thought at all to.

The statement that the current NNTP is nothing more than a file
transfer protocol is largely correct.  It's a specialized version that
takes advantage of the netnews architecture.  In fact, it would have
been quite possible to implement NNTP *in* FTP as an extension.

Eliot Lear
[lear@sgi.com]




From wei@sting.berkeley.edu  Mon Apr  6 06:07:39 1992
Return-Path: <wei@sting.berkeley.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA03043; Mon, 6 Apr 92 06:07:39 GMT+0200
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA24917; Mon, 6 Apr 92 06:02:33 +0200
Received: by sting.Berkeley.EDU (5.65/XCF-1.34)
	id AA00155; Sun, 5 Apr 92 21:02:27 -0700
Date: Sun, 5 Apr 92 21:02:27 -0700
From: wei@sting.berkeley.edu (Pei Y. Wei)
Message-Id: <9204060402.AA00155@sting.Berkeley.EDU>
To: www-talk@nxoc01.cern.ch
Subject: HTML printing

Hi,

Does there exists something like a HTML to postscript/troff/* converter?
I'm looking for something better than ``www -n foo.html > lpr''.

Thanks.

-Pei

From emv@heifetz.msen.com  Mon Apr  6 06:20:26 1992
Return-Path: <emv@heifetz.msen.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA03055; Mon, 6 Apr 92 06:20:26 GMT+0200
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA26318; Mon, 6 Apr 92 06:15:21 +0200
Received: by heifetz.msen.com (/\==/\ Smail3.1.22.1 #22.11)
	id <m0lXl49-000HcJC@heifetz.msen.com>; Mon, 6 Apr 92 00:12 EDT
Message-Id: <m0lXl49-000HcJC@heifetz.msen.com>
To: wei@sting.berkeley.edu (Pei Y. Wei)
Cc: www-talk@nxoc01.cern.ch
Subject: Re: HTML printing 
In-Reply-To: Your message of Sun, 05 Apr 92 21:02:27 -0700.
             <9204060402.AA00155@sting.Berkeley.EDU> 
Date: Mon, 06 Apr 92 00:12:19 -0400
From: Edward Vielmetti <emv@msen.com>

Pei,

You should look on the various SGML archives for tools.  If you
search
	www 'wais://wais.msen.com:210/web?sgml'
you should be led off in the right direction.

--Ed

From RASHTY@hujivms.bitnet  Mon Apr 13 17:28:35 1992
Return-Path: <RASHTY@hujivms.bitnet>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20004; Mon, 13 Apr 92 17:28:35 GMT+0200
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA08959; Mon, 13 Apr 92 17:28:22 +0200
Message-Id: <9204131528.AA08959@dxmint.cern.ch>
Received: from CEARN.cern.ch by CEARN.cern.ch (IBM VM SMTP V2R1)
   with BSMTP id 8169; Mon, 13 Apr 92 17:28:06 SET
Received: from HUJIVMS (RASHTY) by CEARN.cern.ch (Mailer R2.07B) with BSMTP id
 1105; Mon, 13 Apr 92 17:28:05 SET
Received: by HUJIVMS (HUyMail-V6j); Mon, 13 Apr 92 18:28:03 +0300
Date:     Mon,  13 Apr 92 18:28 +0300
From: Dudu Rashty +972-2-584848 <RASHTY@hujivms.bitnet>
To: www-talk@nxoc01.cern.ch
Subject:  few bugs in www

Hi,

when i have tried to create a telnet session than

1) it used the rlogin instead of the telnet
2) the way the rlogin is used is wrong.. it sends the command
   RLOGINtelnet /username= ...  WITHOUT THE HOSTNAME


C++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++C
HTACCESS.C   as it is now
C++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++C

	BOOL rlogin = strcmp(access, "rlogin");
	.
	.
	.

	if (!rlogin) {			/* telnet */
	    if (user) printf("When you are connected, log in as %s\n", user);
	    sprintf(command, "TELNET %s%s %s",
		port ? "/PORT=" : "",
		port ? port : "",
		hostname);
	} else {
	    sprintf(command, "RLOGIN%s%s%s%s %s", access,
		user ? "/USERNAME=" : "",
		user ? user : "",
		port ? "/PORT=" : "",
		port ? port : "",
		hostname);
	}
	if (TRACE) fprintf(stderr, "HTaccess: Command is: %s\n", command);
	system(command);
	return HT_NO_DATA;		/* Ok - it was done but no data */

C++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++C
correct code is :
C++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++C
HTACCESS.C

	BOOL rlogin = strcmp(access, "rlogin");
	.
	.
	.

	if (rlogin) {			/* telnet */
	    if (user) printf("When you are connected, log in as %s\n", user);
	    sprintf(command, "TELNET %s%s %s",
		port ? "/PORT=" : "",
		port ? port : "",
		hostname);
	} else {
	    sprintf(command, "RLOGIN%s%s%s%s %s",
		user ? "/USERNAME=" : "",
		user ? user : "",
		port ? "/PORT=" : "",
		port ? port : "",
		hostname);
	}
	if (TRACE) fprintf(stderr, "HTaccess: Command is: %s\n", command);
	system(command);
	return HT_NO_DATA;		/* Ok - it was done but no data */

C++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++C


                                              thanks
   	                                       __o     o__
	                                     _ \<,_   _.>/ _
	                                    (_)/ (_) (_) \(_)

                                             d    u   d    u
                                             Hebrew University
                                             Computation Center
                                             Jerusalem, Israel


From timbl  Tue Apr 14 12:00:05 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA21572; Tue, 14 Apr 92 12:00:05 GMT+0200
Date: Tue, 14 Apr 92 12:00:05 GMT+0200
From: timbl (Tim Berners-Lee)
Message-Id: <9204141000.AA21572@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: wei@sting.berkeley.edu (Pei Y. Wei)
Subject:  HTML printing: Conversion HTML->LaTeX->dvi->Postscript
Cc: www-talk@nxoc01.cern.ch

> Does there exists something like a HTML to postscript/troff/* converter?
> I'm looking for something better than ``www -n foo.html > lpr''.

Here's a simple html to latex converter using "sed". It's not complete, but it  
produces reasonable results on the W3 documentation, so I can now (at last) make a  
W3 book.  (A minor problem is that sed ignores any characters at the end of a file  
which are not followed by a final newline, and the NeXT editor sometimes generates  
HTML without the final newline.)

You have to prepend the document style you want to the output of sed. My makefile  
looks like

	echo " \\\\batchmode \\\\documentstyle{book}" > the_www_project.tex
	sed -f html2latex.sed $(THE_HTML) >> the_www_project.tex
	latex  the_www_project.tex


For a large book, I concatenate several html files, passing some of them through  
another sed file which removes the <TITLE> elements and demotes the <H1> to <H2>  
etc.  The file below italicises anchors, but in general it might be best to remove  
them altogether. The smartest thing would be to generate the TeX to make a little  
superscript reference to the page number to which a link refers.  Any TeX experts  
out there?

I'll put the "W3 Book" in postscript up for anonymous FTP shortly.


 Tim BL
__________________________________________ html2latex follows
1i\
\\begin{document}
$a\
\\end{document}
/<XMP>/,/<.XMP>/b lit
/<.XMP>/b lit
/<xmp>/,/<.xmp>/b lit
/<.xmp>/b lit
/s?&amp.?\\&?g
s?&gt.?>?g
s?&lt.?<?g
s?\\?\\backslash ?g
s?{?\\{?g
s?}?\\}?g
s?%?\\%?g
s?\$?\\$?g
s?&?\\&?g
s?#?\\#?g
s?_?\\_?g
s?~?\\~?g
s?\^?\\^?g
s?<TITLE>?\\author{Generated from the Hypertext}\\title{?g
s?</TITLE>?}\\maketitle ?g
s?<ADDRESS>??g
s?</ADDRESS>??g
s?<P>?\\par?g
s?<p>?\\par?g
s?<Hn>?\\part{?g
s?</Hn>?}?g
s?<H1>?\\chapter{?g
s?</H[0-9]>?}?g
s?<H2>?\\section{?g
s?<H3>?\\subsection{?g
s?<H4>?\\subsubsection{?g
s?<H5>?\\paragraph{?g
s?<H6>?\\subparagraph{?g
s?<UL>?\\begin{itemize}?g
s?</UL>?\\end{itemize}?g
s?<LI>?\\item ?g
s?<ul>?\\begin{itemize}?g
s?</ul>?\\end{itemize}?g
s?<li>?\\item ?g
s?<DL>?\\begin{description}?g
s?</DL>?\\end{description}?g
s?<DT>?\\item[?g
s?<DD>?]?g
s?<dl>?\\begin{description}?g
s?</dl>?\\end{description}?g
s?<dt>?\\item[?g
s?<dd>?]?g
s?<NEXTID[^>]*>??g
s?<A[^>]*>?\\it  ?g
s?</A>?\\/\\rm  ?g
: lit
s?<XMP>?\\begin{verbatim}?g
s?</XMP>?\\end{verbatim}?
s?<xmp>?\\begin{verbatim}?g
s?</xmp>?\\end{verbatim}?

From timbl  Thu Apr 16 08:59:03 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA26309; Thu, 16 Apr 92 08:59:03 GMT+0200
Date: Thu, 16 Apr 92 08:59:03 GMT+0200
From: timbl (Tim Berners-Lee)
Message-Id: <9204160659.AA26309@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: wathu@lanka.ccit.arizona.edu (Wije Wathugala)
Subject: Extracting the source of HTML documents, menus etc.
Cc: www-talk@nxoc01.cern.ch

> Date: Wed, 15 Apr 92 16:16:31 MST
> From: wathu@lanka.ccit.arizona.edu (Wije Wathugala)

> Hi Tim,
> Thank you very much for your mail and all the work you are doing
> with WWW.

> I have one more questions. 

>
> (1)I would like to down load some menus from WWW server in HTML so  
> that I can
> edit them and use locally.  I knwo how to get text file with the
> command but do not know how to get HTML file.  I tried to start WWW
> by www -source but still I could not get the HTML file.  Is there a  
> command
> to get them. 


The trick is to browse to the document (menu etc) you want normally  
with www. Then type "help" to get the www address of the document.  
Copy it down.

Then, separately, give the command

	www -source -n -p documentaddress > file

to extract the source into a file.  You have to do this because if  
you start www in -source mode, you can't follow any links. We paln to  
add a command to extract the source of an article when you're  
browsing normally.

> (2) Can I post this type of questions to www-interest list ?

You should send them to www-talk@info.cern.ch, not to -interest.  
Interest is used for announcements to a larger number of people.

> Thank you
>
> Wije

	You're welcome!
	Tim

PS: I'm away next week (back 27th April) but others on the list may  
be prepared to answer more questions.



From timbl  Wed May 20 12:21:37 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA07372; Wed, 20 May 92 12:21:37 GMT+0200
Date: Wed, 20 May 92 12:21:37 GMT+0200
From: timbl (Tim Berners-Lee)
Message-Id: <9205201021.AA07372@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Peter Huang <peter@hpkslx.mayfield.hp.com>
Subject: Re: www on hp3/800 at HP
Cc: www-talk@nxoc01.cern.ch


>	I saw your posting for new release of WWW and I pulled the
>	source over.  I'm quite thrill about the work you have
>	done with WWW.  I was able to compile and run the "www" on
>	hp3/800 (hpux 8.0) machine (attached is the Makefile).

	Thanks ... help me understand what the difference is between
	hp700 and hp800 -- are they binary compatible? What does the
	term "snake" cover?

>	The 300 version will coredump if the /etc/services entry
>	for www is missing.

	You mean www will or, httpd?  Or inetd? www shouldn't use the
	/etc/services file. Whats a 300?

>   	The Daemon portion also compiled and run,
>	(need to add non-protoize function declaration in HTDaemon.c and
>	HTRetrieve.c.)  I'll try to contrib when I got time.

	Shucks --I thought we'd done all that in the 0.4 release!
	What are the functions which needs de-ansiing?  We compiled in
	on a sun ok, thought that was basic enough cc.

>	I have read the copyright page and know the restriction.


	We are considering forming a consortium to allow lots of contribution
	and well-defined sharing of code, and a role for commercial
	participation (buying, selling, sponsoring!)

> 	Let me know if I can help in any other ways.

	There is a list of "how to support the web" on the project page.
	I guess a server at HP might be comercially tricky unless for
	marketting info?  Any tools, neat server scripts all gratefully accepted!
	And pass the word around...
	
>	kudos for a software nicely done.

	Thanks -- on behalf of all the people involved!

	Tim BL

>	============================================================
>	Peter Huang 		HP Response Center Lab
>	Phone: (415)691-3417  	Email: peter@hpkslx.mayfield.HP.COM
>	100 Mayfield Avenue, MS 37MA, Moutain View, CA 94043
>	============================================================


From timbl  Wed May 20 13:38:20 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA07560; Wed, 20 May 92 13:38:20 GMT+0200
Date: Wed, 20 May 92 13:38:20 GMT+0200
From: timbl (Tim Berners-Lee)
Message-Id: <9205201138.AA07560@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: Rama Porrat <rama@noa.huji.ac.il>
Subject: Line mdeo browser Q & A
Cc: www-talk@nxoc01.cern.ch

Rama Porrat <rama@noa.huji.ac.il> writes,

> I am trying to write a help online manual using www. I think www can be very
> useful and the stuff I deal with really needs such a system.
> BUT there are things which must be corrected in order that www will be
> usable, and I am writing in the hope that you can correct the www
> or tell me if those things can be corrected soon.

> The most annoying thing is the appearance of a "listing" file
> on screen when displayed by www.  The fact that pages
> are displayed "backward", that is, you repeat portions of the
> file alsready displayed when you come to the last page of a file,
> is unacceptable. It makes a mess out of the display.
> The way a listing file is displayed should adhere to what's in the
> file, without any "going back". This is essential.

We haven't noticed this!  By "listing" file, do you mean a plain text
file, or part of a marked-up file which uses the <listing> tag?
What you describe seems very strange.  Please give me a reference to the file
if it on a server, or else mail it to me. Thanks.

> The way the word [end] is displayed at the end of a screen is not nice.
> It is too close to the last line and in the "middle" of the line.
> The www writer should have the option not to display this [end]
> at all, or otherwise put it in a lower and to the side of the
> screen.

This is question of taste. It is easy to recompile the browser (just GridText.c)  
with the option to define the macro END_MARK as "".  You can do that locally. The  
[end] mark was put in to stop people trying to scroll past the end of the file,  
having to press RETURN every time just to see whether there was any more left. It  
would be simpler not to have it!

Note that the X11 browsers do not have an [end] mark. On VM/XA  and MVS systems
it is <End>.  You could make a void (or Hebrew) version if you needed to.
All these things are in the file GridText.c in the browser implementation.
You could make a local variant.  If others on this list agree, then we could  
incorporate certain options in the master source.

> The h1 header displays numbers which are confusing for the reader.
> For example, in my listing file, the addition of <h1> displays the numbers (on
> sequential pages)  no numbers on first page
> (50/64) on next page (64/64) on last page.
> This is quite confusing.

These numbers should not be connected with the <h1> header.  They are the line  
number of the bottom line on the screen, and the line number of the last line in  
the file. They were asked for by users, to give some indication of how far through  
the document one is.  The numbers are not displayed on the first page as that
is displayed before the whole file has been read, for speed.

>  You even get numbers like (69/64).

If you could scroll down so that blank space after the document were
displayed, the bottom line would be greater than the last line.  But I can't get it  
to happen. Could you give me the exact sequence you are using, please, along with  
document addresses?  And tell me the version number you have (type help to get it).

> Please erase those numbers, or display something clear, like:
> Page 1 out of 3          etc.

The file is not divided strictly into pages, so to displaying page numbers
would be confusing. For example, when one returns to a document one has left by a  
link, that link is displayed, if possible, a third of the way down the page. This  
will not necessarily allign with a "page".

> The <title> is inconsistent - at times it appears on screen, 

> at other times it doesn't.

	Yes. This is a feature of the pipeline optimisation of
	the code. When the document is started, its title is not
	known and neihter is its length. The first characters are
	displayed on the screen as soon as they come in. This
	gives a faster response time

> There should be a possibility to enter the www with a pre-known
> pointer.  For example, saying     www tex
> should be able to give you the top screen pertaining to tex, without
> going through a number of previous screens.

	There is.
	You have to give the network address of the document, which is
	not so simple. It would have to be something like

		www http://info/tex

	as "www tex" will cause www to read a file "tex" in the local directory.

	Note you can define alises for users
	
		alias texhelp www http://info/help/tex/html
		texhelp

	for example

>It is also very important to give the possibility of including
>comments in html sources.

	Yes. The <comment> junk </comment> tag is respected by
	recent browsers, but only recent ones: Not the NeXT browser.

I hope this at least explains the raesons for a few things. I have put the  
intermittency of the title on our list of bugs to be fixed. We may put in options
to control some of the other things. If anyone else on this list has views on these  
things, they should say. It is only from comments from real users what break away  
from our own view of what if "nice" in a browser: Thanks for your feedback.

	Tim BL



From timbl  Thu May 21 10:21:30 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA10391; Thu, 21 May 92 10:21:30 GMT+0200
Date: Thu, 21 May 92 10:21:30 GMT+0200
From: timbl (Tim Berners-Lee)
Message-Id: <9205210821.AA10391@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: frenkiel@cdfap2.in2p3.fr (Pierre Frenkiel)
Subject: Re: www line mode browser
Cc: www-talk@nxoc01.cern.ch


> Date: Thu, 21 May 92 10:04:16 +0200
> From: frenkiel@cdfap2.in2p3.fr (Pierre Frenkiel)

>  I'm using an X terminal, with a mterm window of 60 lines.
> it's rather frustrating to see that the browser only uses 24 lines.
> I thought that the use of the terminfo corresponding to the TERM variable 

> was a standard unix feature.
> Pierre Frenkiel - tel:(331)/44.27.15.27 - e-mail:frenkiel@cdfap2.in2p3.fr


1.  Unfortunately it is only standard unix, and www runs also on vms, vm, pc,...
    However, if you contribut the code to return the width and height of
    the terminal for unix, I'll put it in to a unix-specific section
    and many will be grateful.

2.  Try
		alias www /usr/local/bin/www -p60

3. I guess a shell script could do that too -- that's what we do on VM/CMS

	Tim BL

From bloemer@helios.tnt.uni-hannover.dbp.de  Thu May 21 12:21:57 1992
Return-Path: <bloemer@helios.tnt.uni-hannover.dbp.de>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA10668; Thu, 21 May 92 12:21:57 GMT+0200
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA10114; Thu, 21 May 92 12:20:28 +0200
Date: 21 May 92 10:18
From: "(Arnold Bloemer)" <bloemer@helios.tnt.uni-hannover.dbp.de>
To: www-interest@nxoc01.cern.ch, www-talk@nxoc01.cern.ch
Message-Id: <RFC-822*C=de;ADMD=dbp;PRMD=uni-hannover;OU=tnt;OU=helios;S=AA06131;G=9205211018>
Subject: Program Links in WWW

RFC-822-HEADERS:
Return-Path: <@CDC2.RRZN.UNI-HANNOVER.DE:bloemer@helios.tnt.uni-hannover.de>
Cc: bloemer

==================
I would like to know, whether anybody has extended WWW such, that it is possible
to start arbitrary programs by hitting a button in a WWW browser.

Hyberbole has this feature and it could also make WWW much more mighty.

Included is a short discussion with Pei Y. Wei on this issue. 

By the way, many thanks to Tim Berners-Lee in general, to Pei Y. Wei for his 
wonderful violaWWW browser and to all people involved in the WWW project.

Arnold

________________________________________________________________________________

Dipl.-Ing. Arnold Bloemer	   Universitaet Hannover
				   Institut fuer Theoretische Nachrichtentechnik
				   und Informationsverarbeitung
bloemer@tnt.uni-hannover.dbp.de    Appelstrasse 9A
fax:    +49-511-762-5333           D-3000 Hannover 1
phone:  +49-511-762-5320           Germany
________________________________________________________________________________


From bloemer Tue May 12 18:13:06 1992
To: timbl@info.cern.ch, wei@xcf.berkeley.edu
Subject: World Wide Web and Viola
Cc: bloemer

...
1. Is it possible to define pseudo hyperlinks which start a subprocess?
That would be really phantastic because it allows for nice tutorials with
trial buttons. On xcf.berkely.edu I saw:


About XMap ... 

<P>
Click here for a demo.
<S>/*script*/
	if (accessible("/usr/users/ftp") != "") {
		print("doing /usr/users/ftp/ftp_public/misc/ht/projects/xmap/ultrix.ws.2.1 /usr/users/ftp/ftp_public/misc/ht/projects/xmap/sf_oak &amp \n");
/*		system("/usr/users/ftp/ftp_public/misc/ht/projects/xmap/ultrix.ws.2.1 /usr/users/ftp/ftp_public/misc/ht/projects/xmap/sf_oak &amp ");*/
	} else if (accessible("/map" != "")) {
		/* in case this is running in xcfdemo */
		print("doing /xmap/ultrix.ws.2.1 /xmap/sf_oak &amp \n");
/*		system("/xmap/ultrix.ws.2.1 /xmap/sf_oak &amp ");*/
	} else {
		/* can't guess where xmap executables might be */
		bell(); /* later, use dialogbox */
	}
</S>

Unfortunately it doesn't work, in Viola no button was shown.

...


From wei@sting.Berkeley.edu Wed May 13 16:38:52 1992
Subject: Re:  World Wide Web and Viola
To: bloemer@helios.tnt.uni-hannover.de, timbl@nxoc01.cern.ch,
        wei@sting.Berkeley.edu

> 1. Is it possible to define pseudo hyperlinks which start a subprocess?
> That would be really phantastic because it allows for nice tutorials with
> trial buttons. On xcf.berkely.edu I saw:

> 	About XMap ... 
...
> 	<P>
> 	Click here for a demo.
> 	<S>/*script*/
> Unfortunately it doesn't work, in Viola no button was shown.

Whoops, I guess I really have to clean up the HTML files on our site :-)
Most of this stuff was for internal demonstrations, and the scripting 
stuff was an unfinished experiment. It doesn't work for you because you must 
have the demonstration Xmap program locally. But, yes, the intent of the 
<S> tag is to embed viola scripts in HTML, so that you can ``program links''
to do lots of complicated and neat things... In this case, the script 
tries to start up another process.

...

From timbl  Thu May 21 14:21:41 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA10960; Thu, 21 May 92 14:21:41 GMT+0200
Date: Thu, 21 May 92 14:21:41 GMT+0200
From: timbl (Tim Berners-Lee)
Message-Id: <9205211221.AA10960@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: "(Arnold Bloemer)" <bloemer@helios.tnt.uni-hannover.dbp.de>
Subject: Program Links in WWW
Cc: www-talk@nxoc01.cern.ch



> I would like to know, whether anybody has extended WWW such, that it is possible
> to start arbitrary programs by hitting a button in a WWW browser.
>
> Hyberbole has this feature and it could also make WWW much more mighty.
>
> Included is a short discussion with Pei Y. Wei on this issue. >
> By the way, many thanks to Tim Berners-Lee in general, to Pei Y. Wei for his 

> wonderful violaWWW browser and to all people involved in the WWW project.
>
> Arnold

Very  good question. The problem is that of programming language. You need  
something really powerful, but at the same time ubiquitous. Remember a facet of the  
web is universal readership. There is no universal interptreted programming  
language. But there are some close tries. (lisp, sh).  You also need something  
which can run in a very safe mode, to prevent virus attacks.

Ideally, the language should include object-oriented inheritance, a basically  
functional nature, and a clean syntax. It should be interpretable and compilable.
At least one public domain. A pre-compiled standard binary form would be cool too.  
It isn't here yet.

In reality, what we would be able to offer you real soon now with document format  
negotiation is the ability to return a document in some language for execution,  
with the option of being able to provide it in several languages, the language  
being a "data format" which can be negotiated between client and server at  
run-time.  For, for example, one could provide it in viola script and/or in /bin/sh  
which would cover most ofthe unix world.

	Tim BL



From raisch@cthulhu.control.com  Thu May 21 18:53:00 1992
Return-Path: <raisch@cthulhu.control.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA11993; Thu, 21 May 92 18:53:00 GMT+0200
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA06060; Thu, 21 May 92 18:51:25 +0200
Received: by control.com (4.1/Spike-2.0)
	id AA06599; Thu, 21 May 92 12:50:20 EDT
From: raisch@cthulhu.control.com (Robert Raisch)
Message-Id: <9205211650.AA06599@control.com>
Subject: Links and Type
To: www-talk@nxoc01.cern.ch
Date: Thu, 21 May 92 12:50:19 EDT
X-Mailer: ELM [version 2.3 PL11]

First of all, hearty congrats to the WWW people.  It's a great tool, and
since it is based on SGML, it has the broadest scope of any solution I have
yet seen.

To others, see the current issue of Byte magazine regarding "Info-Glut" and
SGML.  Interesting.

----------------------

I have a few recommendations regarding new link types in WWW.  This is based
on thinking about hyper-applications for almost 15 years, (ever since I 
first had the pleasure of hearing Ted Nelson speak in 1977.)

First though, there are a few ideas which I feel should be mentioned to 'set
the stage' for my list.

	Transparent Documents  --

		a transparent document is one which a user creates locally,
		and that is a new representation of an existant document.
		Transparent documents are used to create new local links on
		a document which I do not have permission to modify.

		Transparent documents can then be made available to others,
		(published) just as a "regular" document is, thus facilitating
		the creation of new works from old.

	User Documents --

		a user document is where I keep my "bookmarks", links to
		local documents, links to messages from others, links to
		my "attention" links, (see below).  User documents are where
		we, as navigators of the docuverse, are defined as individuals.

		They are also where we can keep links to other user documents
		which have been permitted to view/modify my own local documents.

		Another function of the User document is to collect users into
		an abstract group. (Thus, based on my membership in user 
		document 'Research Group', I am permitted access to materials
		'owned' by that group. Of course, messages sent to an abstract
		group then become available to all members of that group.)

		(Please note that a User Document is nothing more or less than
		 a collection of links, (as all documents are).)

------------------

Now, on to my list of link types....

There are 4 'minimal' link types which, I believe, a useful h-app *must* 
support.

	1.	Replacement
			-- when activated, replaces the current document
			   with a new document.

	2.	Annotation
			-- when activated, overlays a new document on the
			   current document, partially obscuring the original.

	3.	Inclusion
			-- when the document is created, elements from other
			   documents are collection to be included in the
			   representation of the current document.  (Quotes)

	4.	Expansion
			-- when activated, new information is added to the 
			   current document, expanding the original scope.
			   (Think of outline processors, and the collapse
			    of detail.)  This is also a reflection of
			   Nelson's concept of 'stretch text'.  (Stretch text
			   is where a sentence is constructed in such a 
			   way that when it is collapsed it states it's thesis 
			   in simple terms, and when expanded adds detail to
			   further express itself.

There are 3 further types which I believe are necessary to complete the
function paradigm.


	6.	Execution

			-- when activated, some arbitrary function is performed.
			   The point that was mentioned about the lack of an
			   ubiquitious scripting language is well made.  Lisp
			   is too arcane for most.  Shell languages are too
			   platform specific.  What is needed is a simple
			   to understand, freely available scripting platform.
			   Although I hesitate to mention it, REXX might be
			   a reasonable choice due to it's broad availability.

	5.	Attention   (a specialisation of the Execution type)

			-- when the current document is modified (a link is
			   added, or removed, or the document is merely read)
			   a message is sent to the 'owner' of the attention
			   link.  This message creates a new link in the 'user
			   document' of the individual who placed the attention.

			   In this way, I could place a link onto a document I 
			   had interest in, and when it was changed or accessed 			   in some manner, I would be informed.

	7.	Collection  (a non-local specialisation of the Execution type)

			-- when activated, a collection link leaves the current
			   document, and 'travels' the docuverse, in search of
			   other documents which satisfy it's internal criteria.
			   This is the concept of a 'knowbot'.

			   Collection links can be activated based on day and
			   time, much like the WAIS questions in the MAC 
			   WAIS interface, WAIS-Station.  They could also be
			   activated based on external events, such as the 
			   activation of an attention link.

			   Collection links would be written in the ubiquitious
			   scripting language, and would only be allowed to 
			   operate on documents which were EXPLICITLY permitted.


----------------------------


One of the missing pieces here is the ability of creating new h-texts, and 
adding new links to old h-texts.

Hypertext, and like systems, are of limited use if they do not support 
collaboration.  I feel that this is a VERY important point.


----------------------------

So.....

	Scenerio:

		I start my session with my h-app, and open my user
		document.

		I notice that 17 of my attention links have been activated 
		in the last day.

		I select the most interesting and activate the link which
		it created in my personal user document.

		I am now reading an article which I previously linked, and
		see that an annotation which I made some time ago has been
		added to, by a colleague.

		The comments are pertinant to my current work, so I create
		a new local 'transparent' document to mirror the original 
		work.

		On this new document, I make a few new annotations and decide
		to made this new work available to the research group of which
		I am leader.  I place a link to it in the user document which
		represents my working group.

		I also send a new document link to the colleague who made the
		original comments, so that he can see how I have interpreted 
		his ideas, and included them into my own research.

		I move ever onwards...

---------------------

Ok, I hope that that fuels a little discussion, and I would *love* to hear 
from others regarding these ideas.  

(Yes, some of what I have talked about here exists, in various forms, on my 
own personal system, but I would be *very* interested in hearing from 
developers who might be interested in making something real and useful from 
it.  I, sadly, have too little time to make these ideas real.  My ultimate
goal would be to make the realisation of these ideas available in the public
domain.)

Regards, /rr

"knowledge is the *only* weapon"
-- 

From emv@nigel.msen.com  Fri May 22 06:12:06 1992
Return-Path: <emv@nigel.msen.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA12915; Fri, 22 May 92 06:12:06 GMT+0200
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA10097; Fri, 22 May 92 06:10:38 +0200
Received: by nigel.msen.com (/\==/\ Smail3.1.25.1 #25.5)
	id <m0loQxU-000A6nC@nigel.msen.com>; Fri, 22 May 92 00:10 WET DST
Message-Id: <m0loQxU-000A6nC@nigel.msen.com>
To: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Cc: "(Arnold Bloemer)" <bloemer@helios.tnt.uni-hannover.dbp.de>,
        www-talk@nxoc01.cern.ch
Subject: Re: Program Links in WWW 
In-Reply-To: Your message of Thu, 21 May 92 14:21:41.
             <9205211221.AA10960@ nxoc01.cern.ch > 
Date: Fri, 22 May 92 00:10:08 EDT
From: Edward Vielmetti <emv@msen.com>

the "atomicmail" work from bellcore / nat borenstein on a programming
language to embed in active mail messages sounds like what you want.
they have a patent on it :(.  ask nsb for details.  --Ed

From timbl  Mon May 25 16:40:01 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA18996; Mon, 25 May 92 16:40:01 GMT+0200
Date: Mon, 25 May 92 16:40:01 GMT+0200
From: timbl (Tim Berners-Lee)
Message-Id: <9205251440.AA18996@ nxoc01.cern.ch >
Received: by NeXT Mailer (1.62)
To: pflynn@curia.ucc.ie (Peter Flynn)
Subject: Re: search engines & views
Cc: www-talk@nxoc01.cern.ch

>        All you do is map the parameters of the virtual search engine
>        onto a document name -- like
>
>        /INDEX/full-text/tryhard/depth=5/boolean
>
> Does this mean (at a primitive level) you could code a grep command
> as a document name?
>
	Yes -- sure.  Its a question of writing down the
	algorithm. In perl, I'm sure its a cinch ... you could
	also do it with sh and sed :-( but basically for example
	you need to take say

	/grep/mydir/i?joe+bloggs

	and turn that into

	grep -l -i "(joe)|(bloggs)" | awk -f ls2html.awk

where ls2html.awk looks something like:

	BEGIN {  print "Select one of:\n<MENU>" }
	{ printf "<LI><A HREF=./%s>   %s</A>\n", $1, $1 }
	END { print "</MENU>" }

The awk generates the HTML for a menu.  I guess you could use awk in fact to  
generate the grep command too.  But these are just ideas.  or are you using VMS?  
Yes, you could probably do it with DCL and SEARCH.

Of course if you can handle C, then hack the sample httpd.

> ///Peter

	Tim

From kim@usceast.cs.scarolina.edu  Mon May 25 18:05:56 1992
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA19313; Mon, 25 May 92 18:05:56 GMT+0200
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA03674; Mon, 25 May 92 18:04:35 +0200
Received: from Relay.Prime.COM by mcsun.EU.net with SMTP
	id AA28869 (5.65b/CWI-2.163); Mon, 25 May 1992 18:04:29 +0200
Received: from usceast by Relay.Prime.COM; 25 May 92 12:04:48 EST
Return-Path: <kim@usceast.cs.scarolina.edu>
Received: by usceast.cs.scarolina.edu (cs.grandpoobah.043092)
From: Su-Hee Kim <kim@usceast.cs.scarolina.edu>
Message-Id: <9205251600.AA14397@usceast.cs.scarolina.edu>
Subject: Installation of viola
To: www-talk@nxoc01.cern.ch
Date: Mon, 25 May 92 12:00:50 EDT
Cc: kim@mcsun.eu.net (Su-Hee Kim)
X-Mailer: ELM [version 2.3 PL11]

	To whom it may  concern	

	I am trying to install your viola www package on my account,
not public.  When I make the Makefile, it gives me a message which says 
" don't know to make -lXmu''. The file libXmu.a is under the /usr/lib.
In the Makefile, there are many variables such as CONFIGSRC =
$(TOP)/config,  LIBSRC = $(TOP)/lib which are not available under my 
account.  Am I missing some?  I got ViolaWWW_920515.tar.Z from your
place.  I want to make this package run as soon as possible.        
	Thank You very much.




From wei@xcf.berkeley.edu  Tue May 26 02:47:31 1992
Return-Path: <wei@xcf.berkeley.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA19834; Tue, 26 May 92 02:47:31 GMT+0200
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA16395; Tue, 26 May 92 02:46:12 +0200
Received: by xcf.Berkeley.EDU (5.65/XCF-1.34)
	id AA21056; Mon, 25 May 92 17:46:02 -0700
Date: Mon, 25 May 92 17:46:02 -0700
From: wei@xcf.berkeley.edu (Pei Y. Wei)
Message-Id: <9205260046.AA21056@xcf.Berkeley.EDU>
To: kim@usceast.cs.scarolina.edu, www-talk@nxoc01.cern.ch
Subject: Re:  Installation of viola
Cc: kim@mcsun.eu.net

You should first run xmkmf to generate a Makefile that is specific to 
your system. 

If the resulting Makefile still gives you trouble, then manually change
the Makfile accordingly...

Ie: in my particular Makefile, you see
		...
        CONFIGSRC = $(TOP)/config
       DOCUTILSRC = $(TOP)/doc/util
        CLIENTSRC = $(TOP)/clients
          DEMOSRC = $(TOP)/demos
           LIBSRC = $(TOP)/lib
          FONTSRC = $(TOP)/fonts
       INCLUDESRC = $(TOP)/X11
        SERVERSRC = $(TOP)/server
		...

> "don't know to make -lXmu".  The file libXmu.a is under the /usr/lib.

So, change the LIBSRC line to something like:

           LIBSRC = /usr/lib

> I want to make this package run as soon as possible. I sent a
> mail regarding this problem to Dan. But I found that he is not there.

Cool. Have no idea who Dan is thou...

Mail me if you still have trouble setting it up.

-Pei

From timbl@www2.cern.ch  Tue May 26 16:01:05 1992
Return-Path: <timbl@www2.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA21392; Tue, 26 May 92 16:01:05 GMT+0200
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA04242; Tue, 26 May 92 15:59:45 +0200
Received: by  www2.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA00465; Tue, 26 May 92 15:55:11 GMT+0100
Date: Tue, 26 May 92 15:55:11 GMT+0100
From: timbl@www2.cern.ch
Message-Id: <9205261455.AA00465@ www2.cern.ch >
Received: by NeXT Mailer (1.63)
To: John O'Neall <JON@frcpn11.in2p3.fr>
Subject: Making a simple file server for unix
Cc: www-talk@nxoc01.cern.ch

> Date:         Tue, 26 May 92 09:16:33 EST
> From: John O'Neall <JON@frcpn11.in2p3.fr>

John,

Thanks for your mails asking about how to make a simple hypertext  
list of files. 


Let me take your model of the three layers. I sugest that the 1st  
layer

>helppage (1st-layer):  various info and pointers to local
>         info that won't change much as well as to WWW proper.
>         Among other things, it'll point to a [...]

you write by hand.  You can read the HTML documentation in    
http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html
or you can just copy another node which looks like what you want.   
(Get its address and then read it with www -source address). Or both.

You might want to put pointers to the general subject index, and the  
index of HEP sites.  Let me know when you set up a server so that I  
can put in on the list!

For the second layer,

> 2nd-layer list:  this is the thing that will point to ALL the
>    local flat files.  Then I need an automatic (cron'd) procedure
>    to regenerate the list every time I add more flat data
>     files (or periodically).


This is easy: you generate it with a little shell script. How about  
the one appended to this message?  It generates a hypertext menu of  
files in a set of directories (passed as arguments). You can massage  
it to put in your own title, heading, etc. If you can generate some  
human-readable description of each file, so much the better.  You  
could run this as a cron job into an Overveiw.html file which you  
then publish with httpd, or run it from a server daemon script so  
that the list is generated fresh for every read, and always  
up-to-date.

The third layer is your set of files.  If you publish these with a  
server script, you can add to each one a title (if you can generate  
it) and maybe a link back to the list of other documents befre you  
just send the plain text file.

You could read some of the tips on the web about ettiquette, etc.

>By the way, if such a list already exists in W3 somewhere, I'll be
>happy to point to it too.  And then there's no reason why our server
>shouldn't be accessible to the rest of W3.

It would be a great addition to the web!  Let me know when it's up  
and I'll put pointers to it from some overview documents. (evn at the  
experimental stage, I'll mark the links "experimental" if you like).

>Tim, any chance of getting the hypertext editor for something other
>than Next one day?  I realize that was the simplest thing to develop
>it on, but I suspect it's not the commonest workstation in HEP.

It's on the "hit-list" and we're lookig for volunteers. Maybe the Mac  
browser will be the first to be also an editor. Or maybe one of hte X  
browsers.  I agree the NeXT is not the most common platform! (Though  
it is neat ;-)

>Excuse me for rushing, but I'm supposed to present this project to  
the
>HEPIX-F meeting in Paris on Tuesday and this weekend will be a long  
one
> (at least, in France).

Good luck! Mail cailliau@(I'm off to the US -- so I'm rushing too!)

Mail me or www-talk@info.cern.ch if you have any problems... 


If you have smart ideas, circulate them to this list too.

	Tim

__________________

#! /bin/sh
# Generate hypertext menu from directory list
echo "<TITLE>Information under: $*</TITLE>"
echo "<H1>$*</H1>"
# If there is a README file include that as plain text
if [ -f $1/README ]; then
    echo "<XMP>"
    cat $1/README
    echo "</XMP>"
fi
# Now generate a list of links to files
echo "<DIR>"
for dir in $*
do (
    cd $dir
    for file in *.html *.txt
    do 

	echo "<LI><A HREF=./$dir/$file>Title of $file</A>" 

    done
    )
done
echo "</DIR>"


From jfg@dxcern.cern.ch  Wed Jun  3 10:13:46 1992
Return-Path: <jfg@dxcern.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA06062; Wed, 3 Jun 92 10:13:46 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA13620; Wed, 3 Jun 92 10:11:44 +0200
Received: by dxcern.cern.ch (5.57/Ultrix3.0-C)
	id AA01209; Wed, 3 Jun 92 10:11:30 +0200
Date: Wed, 3 Jun 92 10:11:30 +0200
Message-Id: <9206030811.AA01209@dxcern.cern.ch>
From: jfg@dxcern.cern.ch (Jean Francois Groff)
Sender: jfg@dxcern.cern.ch
To: Craig A. Summerhill <craig@cni.org>
Cc: www-talk@nxoc01.cern.ch
Subject: Re: Just a question
References: <9205272153.AA19773@a.cni.org>

>>>>> On Wed, 27 May 92, Craig A. Summerhill <craig@cni.org> was worried:

Craig> What happened to the hacker's dictionary that used to be on WWW?
Craig> I can't seem to find it anymore.  It was a good resource... :*)

  The hacker's dictionary that once was on the default home page was a
demonstration of Hyper-G, a hypertext system developed by Frank Kappe
and others at the University of Graz, Austria. Hyper-G is now fully
operational and includes much more information than just the hacker's
jargon. We have a pointer to it on the list of W3 servers. There is
also a direct pointer to the hacker's jargon from the list of academic
information, under `Computing'. You will find that the presentation
has changed. For reference, the UDIs to Hyper-G and the Hacker's
Jargon are http://iicm.tu-graz.ac.at:80/ROOT and
http://iicm.tu-graz.ac.at:80/Cjargon

--
  Jean-Francois Groff (jfg@info.cern.ch)
  World-Wide Web initiative
  CERN, ECP division, CH-1211 Geneva 23, Switzerland
  Phone +41 22 767 3755 -- Fax +41 22 767 7155
--
"In general, it is safe and legal to kill your children and their children"
-POSIX Prg Gt, by Donald Lewine, O'Reilly & Associates, 1991, p.110
(On process termination)


From jfg@dxcern.cern.ch  Wed Jun  3 10:13:12 1992
Return-Path: <jfg@dxcern.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA06058; Wed, 3 Jun 92 10:13:12 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA13602; Wed, 3 Jun 92 10:11:16 +0200
Received: by dxcern.cern.ch (5.57/Ultrix3.0-C)
	id AA01201; Wed, 3 Jun 92 10:11:11 +0200
Date: Wed, 3 Jun 92 10:11:11 +0200
From: jfg@dxcern.cern.ch (Jean Francois Groff)
Message-Id: <9206030811.AA01201@dxcern.cern.ch>
To: www@nxoc01.cern.ch, www-talk@nxoc01.cern.ch, www-interest@nxoc01.cern.ch
Subject: Mailing list problems

  An error of configuration has prevented recent messages from being
broadcast to the list. Also, many new addresses that we collected were
wrong. My apologies to those who tried to post and received loads of
bounces. If you come to Geneva, I'll owe you a drink :-) Everything
should work fine now, and I'll re-post the messages that I know of.

  I'd like to take the opportunity of this administrative message to
ask for your cooperation: if you know a valid address for one of the
people mentioned below, please mail it to me. They have been removed
for now...

kova!floss
Elizabeth Porteneuve CNET CRPE Net Mgr -- JENC92 <porteneu@frcrpe61.bitnet>
Ermanno Polli - INFN Frascati - AIHEP-92 <polli@lnf.infn.it>
Marc de Lyon -- JENC92 <Marc.deLyon@ica.Rulimberg.nl>
John McMillan - Univ. Leeds - AIHEP-92 <phybjem@sun.leeds.ac.uk>
Gottfried Mayer-Kress <gmk@pegasos.ucsc.edu>

  Your devoted list manager,

--
  Jean-Francois Groff (jfg@info.cern.ch)
  World-Wide Web initiative
  CERN, ECP division, CH-1211 Geneva 23, Switzerland
  Phone +41 22 767 3755 -- Fax +41 22 767 7155
--
"The future will be better tommorrow."  -- Dan Quayle


From jkp@lusmu.cs.hut.fi  Wed Jun  3 19:10:23 1992
Return-Path: <jkp@lusmu.cs.hut.fi>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA09993; Wed, 3 Jun 92 19:10:23 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA19336; Wed, 3 Jun 92 19:08:25 +0200
Received: from lusmu.cs.hut.fi by mcsun.EU.net with SMTP
	id AA06080 (5.65b/CWI-2.165); Wed, 3 Jun 1992 17:58:45 +0200
Received: by lusmu.cs.hut.fi id AA18097
  (5.65c8/HUTCS-C-91-07 for www-talk@nxoc01.cern.ch); Wed, 3 Jun 1992 18:57:19 +0300
Date: Wed, 3 Jun 1992 18:57:19 +0300
Message-Id: <199206031557.AA18097@lusmu.cs.hut.fi>
From: Jyrki Kuoppala <jkp@cs.hut.fi>
Sender: jkp@lusmu.cs.hut.fi
To: www-talk@nxoc01.cern.ch
Subject: Virtual newsgroups and creating the memory of Usenet via www and wais 
Organization: Helsinki University of Technology, Finland.

Here's an article I sent a while ago to various Usenet newsgroups:

>With the enormous numbers of groups now existing, tracking
>through all of them to find interesting and relevant topics
>is becoming very hard, and it seems clear that the namespace
>is becoming hopelessly muddled by democracy in action, so
>a software solution might be the only way out. 

An idea just popped up in my head.

Currently we have a news gateway to www (a www client can be used as a
newsreader), which helps to organized groups better than the flat
namespace of the newsgroups most other newsreaders seem to have.  Ie. you
create www nodes which have topically similar newsgroups listed
together in a node and hypetext links to those newsgroups.  Each user
can make personal nodes having the groups or use ones someone else has
published on the net.

Now, as www works very well with dynamic node creation (which is the
way the nntp and wais gateways work), to have a "virtual newsgroup"
you would just instead of having a link to a "real" newsgroup in a www
node have a link to a FIND node with suitable keywords, and then wais
would be used to search the articles mathing the find criteria and
would create a virtual node for the matches, like the nntp gateway
currently creates a virtual node with the current contents of a real
Usenet group.

As the www news client / gateway already has the capability to follow
links to earlier dicussions (References: field), following the tree
from the virtual newsgroups would be easy.  For more usability, the
www / news gateway (or the news system) should also create forward
references, ie. to backtrack the References field when new articles
come in and thus keep track of articles which are comments to a
certain article.

To apply the same principles to other newsreaders, nntp should be
extended to have the "newsgroup" instead be a "search criteria", which
would be interpreted by the wais (or some other search engine) server
and transparent to nntpd (except if it's the old-fashioned newsgroup
name).  People would have the search criteria - keywords from the
text, poster names, whatever - in their .newsrc files alongside with
the newsgroup names.

I'm not sure how to keep track of what articles the user has already
read, as there are no article numbers for the virtual groups.  Keeping
track of message IDs takes a lot of space.  But even as a browsing
tool it would be nice.

Hmm, come to think of it to combine a wais server indexing the news
database with the www / nntp gateway seems like the obvious thing to
do, so somebody must have done it already?

Also, are there any anonymous nntp servers which function as archives
for some or all of the groups, ie have long-term storage available by
nntp?  CD-ROM online or something.  With the www news gateway and some
minor modifications for some purposes saving an interesting article to
local disk would be unnecssary, instead just the message ID would be
saved (or added to a node on an organized www document so one could
find it later) and the article could later be found on a generic nntp
archive.  To save bandwidth and avoid single points of failure the
nntp archive could be a distributed database with servers on various
parts of the world and cooperation and caching between them - the
servers could compute a hash from the message ID and archive a certain
percentage of the article.  With some coordination the burden of disk
space could be shared so there needn't be one gigantic nntp archive.

//Jyrki

From davis@willow.tc.cornell.edu  Wed Jun  3 21:56:06 1992
Return-Path: <davis@willow.tc.cornell.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA12314; Wed, 3 Jun 92 21:56:06 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA02089; Wed, 3 Jun 92 21:54:17 +0200
Received: by willow.tc.cornell.edu (4.1/SMI-4.1)
	id AA29791; Wed, 3 Jun 92 15:56:01 EDT
Date: Wed, 3 Jun 92 15:56:01 EDT
From: davis@willow.tc.cornell.edu (Jim Davis)
Message-Id: <9206031956.AA29791@willow.tc.cornell.edu>
To: www-talk@nxoc01.cern.ch
Subject: HTML and bitmaps

I would like to include non-text data (e.g. bitmaps, sound)
in my hyperdocuments.

One way to do this would be:

1) extend HTML to allow for such data, e.g.  a tag like
<BITMAP height=100 width=100 encoding=FAXG4>

(I think the WWW protocols would require that the data be uuencoded,
since I don't think they will allow 8bit data)

2) extend Viola (my WWW browser of choice) to recognize
this tag and do the right thing with it, i.e. uudecode it then
start a display program (or place the bitmap into the
document)

A related generalization would be if the browsing program were able to
choose among a set of alternative display programs based on the "type"
of the document linked to.  In this case, I could have an anchor
that pointed to a bitmap file in addition to (or instead of) placing
bitmaps within HTML files.

This sort of thing is possible in both WAIS and MetaMail.  In document
has a type known to the WAIS server, the client inspects this type and
selects a display program according to the values in the Xwais.filters
X resource; In MetaMail each piece of mail is labeled with an explicit
type; MetaMail consults the mailcap file to determine how to display
each type.

I see that anchors in WWW can have a TYPE attibute, but I can't tell
whether this is the type of the link or of the thing linked to.  If
the former, then I see no easy way to do it in WWW, short of adding
the facility to Viola to decide the document type by inspecting it
(e.g. looking for magic header bits which at least some data formats
support.)

I would be grateful for your comments on this matter.

best wishes

From davis@willow.tc.cornell.edu  Wed Jun  3 21:57:49 1992
Return-Path: <davis@willow.tc.cornell.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA12337; Wed, 3 Jun 92 21:57:49 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA02217; Wed, 3 Jun 92 21:55:59 +0200
Received: by willow.tc.cornell.edu (4.1/SMI-4.1)
	id AA29814; Wed, 3 Jun 92 15:57:46 EDT
Date: Wed, 3 Jun 92 15:57:46 EDT
From: davis@willow.tc.cornell.edu (Jim Davis)
Message-Id: <9206031957.AA29814@willow.tc.cornell.edu>
To: www-talk@nxoc01.cern.ch
Subject: WWW vs DynaText

The current issue of Byte has an article on electronic books
written by two folks from the Electronic Book Technologies
company of Providence, RI, USA.  These folks sell a system
called DynaText which delivers electronic books including hypertext
using SGML.  

Can anyone compare this system with WWW?

From timbl  Wed Jun  3 22:45:52 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA13287; Wed, 3 Jun 92 22:45:52 MET DST
Date: Wed, 3 Jun 92 22:45:52 MET DST
From: timbl (Tim Berners-Lee)
Message-Id: <9206032045.AA13287@ nxoc01.cern.ch >
To: davis@willow.tc.cornell.edu, www-talk@nxoc01.cern.ch
Subject: Re: WWW vs DynaText
Cc: timbl

When we looked at dynatext, about a year ago, it was a system which
built (like compiled) a bunch of already written SGML into a "book"
which ould be scanned on-line. The book was kept in some dynatxtext-special
form, with indexes etc. It did not parse SGML on the fly as W3 does.

There is no client-server protcol to dynatext, so you ahve to have the
"book" mounted by NFS say where you want to read it.

A strange thing was that if you made some incremental
change to the underlying SGML then you had to remake the
whole book. The licencing was by the number of books
you made, so every time you reran "make book" or whatever it
was called, you had you rlicense edcremented by one.

I think there is a bit of a review about in on the web... try
www   http://info.cern.ch/hypertext/Products/DynaText/Overview.html
which Ifound by a link from
  http://info.cern.ch/hypertext/Products/Overview.html

Dynatext may have improved since we tried it. The mail
addreses of peopl einvolved are linked to those documents.

Hope this helps.

Tim BL
 PS: It was more like 2 years ago.

From jfg@dxcern.cern.ch  Fri Jun  5 19:55:55 1992
Return-Path: <jfg@dxcern.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA06212; Fri, 5 Jun 92 19:55:55 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA12751; Fri, 5 Jun 92 19:53:57 +0200
Received: by dxcern.cern.ch (5.57/Ultrix3.0-C)
	id AA03322; Fri, 5 Jun 92 19:53:52 +0200
Date: Fri, 5 Jun 92 19:53:52 +0200
From: jfg@dxcern.cern.ch (Jean Francois Groff)
Message-Id: <9206051753.AA03322@dxcern.cern.ch>
To: www-talk@nxoc01.cern.ch
Subject: Re: HTML and bitmaps
References: <9206031956.AA29791@willow.tc.cornell.edu>

  One more comment on this: 

> (I think the WWW protocols would require that the data be uuencoded,
> since I don't think they will allow 8bit data)

  WWW can inherently use several protocols to fetch data. Some of them
may be limited to 7-bit characters, but the WWW architecture itself is
designed with full 8-bit bytes. In particular, the HTTP protocol that
we have defined supports 8-bit data. However, HTML, our language for
describing hypertext, is based on SGML which is essentially a 7-bit
construct (actually, it can only handle a predefined set of characters).

  Your impression may come from the fact that in the current
implementation, as HTTP can only return HTML documents, the
distinction between them is not clear. The new phase of HTTP that we
are currently designing will be able to handle documents of arbitrary
types. These documents are then processed by the client program
depending on its own capabilities, with the possible help of external
converters or applications, much like the Xwais.filters stuff.

--
  Jean-Francois Groff (jfg@info.cern.ch)
  World-Wide Web initiative
  CERN, ECP division, CH-1211 Geneva 23, Switzerland
  Phone +41 22 767 3755 -- Fax +41 22 767 7155
--
Engineers  think that theory  approximates reality.   |   Mathematicians never
Physicists think that reality approximates theory.    |   make the connection.

From connolly@pixel.convex.com  Sat Jun  6 07:56:12 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA07371; Sat, 6 Jun 92 07:56:12 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA16481; Sat, 6 Jun 92 07:54:08 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA24856; Sat, 6 Jun 92 00:53:24 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA23369; Sat, 6 Jun 92 00:53:21 -0500
Message-Id: <9206060553.AA23369@pixel.convex.com>
To: www-talk@nxoc01.cern.ch
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary=-
Subject: MIME as a hypertext architecture
Date: Sat, 06 Jun 92 00:53:20 CDT
From: Dan Connolly <connolly@pixel.convex.com>

NOTE: This message uses existing and proposed MIME structuring
conventions. Some parts of it may look strange on pre-MIME viewers.

---

The WWW project needs an architecture for interchange of structured
multimedia hypertext documents. The original architecture, HTML,
introduced some structuring conventions and a way of specifying
hypertext links.

The HTML format is under stress from several issues:
	* We need an SGML DTD so that we can parse HTML using
	something besides the public implementation of WWW, and so that
	we can verify documents converted from other authoring
	systems such as GNU info, Andew's EZ, or FrameMaker.

	* We need to be able to distribute documents and document
	elements in other formats, including raw 8 bit data streams.
	The SGML NOTATION feature falls short of providing and
	adequate mechanism.

	* The UDI syntax doesn't match the SGML attribute syntax.
	There are problems with quoting out-of-band characters, and
	the length of complex UDI's may exceed SGML limits and/or
	line-length limits of transport mechanisms. Also, the
	terse syntax of UDI's conflicts with the goal that they
	be human-readable.

This is a proposed architecture for global hypertext, addressing
the issues raised by the WWW project, but using the MIME architecture.

We define a new subtype of the MIME multipart content type called
x-HTDOC. The syntax is the same as multipart/mixed, but the semantics
are that of a WWW client: the first part is displayed, and the rest
represent links to other documents or other elements of this document.

Then we define a new subtype of the MIME text content type called
x-HTML. This is an SGML markup language using the default SGML declaration
(i.e. the reference concrete syntax, default processing limits, etc.)
and the HTML DTD (included below).

---

<!-- This DTD was produced by DeveGram on Tue Jun  2 18:58:16 1992 -->
<!-- and hand-edited by connolly@convex.com -->

<!--     Parameter Entities       -->

<!--      Terminal symbols        -->

<!ENTITY % words "#PCDATA" >

<!--    Non-ELEMENT symbols       -->

<!ENTITY % inline	"%words | A" >
<!ENTITY % text         "%inline | P" >
<!ENTITY % heading "H1|H2|H3|H4|H5|H6" >

<!ENTITY lt "<">
<!ENTITY gt ">">
<!ENTITY amp "&">

<!ENTITY lt. "<">
<!ENTITY gt. ">">
<!ENTITY amp. "&">

<!--     Document structure       -->

<!ELEMENT html	O O  (TITLE, NEXTID?, ISINDEX?, section+, ADDRESS?)>

<!ELEMENT TITLE	- -  (%inline)+>
<!ELEMENT ADDRESS - - (%text)+>

<!ELEMENT NEXTID - O EMPTY >
<!ATTLIST NEXTID N NUMBER #IMPLIED>

<!ELEMENT ISINDEX - O EMPTY >


<!ELEMENT section O O ((%heading)?,
			(
			%text |
			section |
			MENU |
			UL |
			OL |
			DIR |
			DL)+)>

<!ELEMENT (H1|H2|H3|H4|H5|H6)	- -  (%inline) >

<!ELEMENT P	- O  EMPTY -- paragraph SEPARATOR -->


<!ELEMENT A	- -  (%inline)+>
<!ATTLIST A
	NAME CDATA #IMPLIED
	PART ENTITY #IMPLIED >

<!ELEMENT MENU	- -  (LI+)>

<!ELEMENT UL	- -  (LI+)>

<!ELEMENT OL	- -  (LI+)>

<!ELEMENT DIR	- -  (LI+)>

<!ELEMENT LI	- O  (%text)+>

<!ELEMENT DL	- -  ((DT, DD)+)>

<!ELEMENT DT	- O  (%inline)+>

<!ELEMENT DD	- O  (%text)+>

---

An HTML document would use external entities to reference other parts
of the multipart message. The system identifier matches the
Content-Id field of the intended part. The content-type of the indicated
part could be image, audio, or video for multimedia inclusions; text for
quotes etc., or message/external-body for references to other documents.

MIME defines access-types for local-file and anon-ftp. We could define
x-HTTP, x-NEWS, x-WAIS, and the other UDI access types.

Within HTML documents, SGML IDREFs and IDs are used to reference and define
elements of a document. (I think HYTIME defines a way to reference elements
without explicit IDs.)


The next part of this message is a default.html from the WWW
distribution adapted to use the conventions here.

It should interoperate with existing MIME systems,
though they will not be able to do anyting intelligent with HTML.

---
Content-Type: multipart/x-HTDOC; boundary=cut-here

--cut-here
Content-Type: text/x-HTML

<!DOCTYPE HTML SYSTEM 
[
<!ENTITY part1 SDATA "QuickGuide.html">
<!ENTITY part2 SDATA "http://info.cern.ch/hypertext/WWW/TheProject.html">
<!ENTITY part3 SDATA "http://crnvmc.cern.ch./WHO">
<!ENTITY part4 SDATA "http://crnvmc.cern.ch./FIND/yellow?">
<!ENTITY part5 SDATA "http://crnvmc.cern.ch./FIND/jaune?">
<!ENTITY part6 SDATA "http://crnvmc.cern.ch./FIND">
<!ENTITY part7 SDATA "http://crnvmc.cern.ch/NEWS/?">
<!ENTITY part8 SDATA "http://crnvmc.cern.ch./NEWS/cern">
<!ENTITY part9 SDATA "http://crnvmc.cern.ch./NEWS/vmnews">
<!ENTITY part10 SDATA "http://crnvmc.cern.ch/NEWS/student">
<!ENTITY part11 SDATA "http://info.cern.ch/hypertext/DataSources/NewsFromVM/Overview.html">
<!ENTITY part12 SDATA "http://info.cern.ch/hypertext/DataSources/News/Overview.html">
<!ENTITY part13 SDATA "http://info.cern.ch/hypertext/DataSources/bySubject/Overview.html">
<!ENTITY part14 SDATA "http://info.cern.ch./hypertext/DataSources/Overview.html">
<!ENTITY part15 SDATA "http://slacvm.slac.stanford.edu./FIND/spires">
<!ENTITY part16 SDATA "http://crnvmc.cern.ch/FIND/DESY?">
<!ENTITY part17 SDATA "http://info.cern.ch:8001/archive.orst.edu:9000/archie-orst.edu">
<!ENTITY part18 SDATA "http://iicm.tu-graz.ac.at./jargon">
<!ENTITY part19 SDATA "http://info.cern.ch./hypertext/Products/WAIS/Sources/Overview.html">
<!ENTITY part20 SDATA "http://info.cern.ch/rpc/doc/User/UserGuide.html">
<!ENTITY part21 SDATA "http://otax.tky.hut.fi/tky/default.html">
<!ENTITY part22 SDATA "gopher://gopher.micro.umn.edu:70/11/Other%20Gopher%20and%20Information%20Servers">
<!ENTITY part23 SDATA "http://info.cern.ch./hypertext/WWW/LineMode/Defaults/default.html">
]>
<TITLE>CERN Information</TITLE>
<NEXTID N=10>
<SECTION><H1>CERN Information - Select by number</H1>
<DL>
<DT><A PART="part1">Help</A>
<DD>On this program, or the
<A PART="part2">World-Wide Web project</A>.
<DT><A PART="part3" NAME=2>Phone book</A>
<DD>People, phone numbers, accounts and email addresses.
See also the analytical
<A PART="part4" NAME=yellow>Yellow Pages</A>, or
the same index in French :
<A PART="part5" NAME=jaune>Pages Jaunes</A>.
<DT><A PART="part6" NAME=1>"XFIND" index</A>
<DD>Index of computer centre documentation, newsletters, news,
help files, etc...
<DT><A PART="part7" NAME=groups>News</A>
<DD>A complete list of all public CERN news groups, such as
<A PART="part8" NAME=3>news from the CERN User's
Office</A>,<A PART="part9" NAME=4>
CERN computer center news</A>,<A PART="part10">
student news</A>. See also <A PART="part11" NAME=5>private
groups</A> and <A PART="part12" NAME=inews>Internet
news</A>.
</dl>
</section>
<section>
<SECTION><H2>From other sites</h2>
See online data by
<A PART="part13" NAME=subject>subject</A>,
pointers to
<A PART="part14">other forms of online data</a>, and the following specific databases:
<DL>
<DT><A PART="part15" NAME=spires>SLAC SPIRES</A>
<DD>The High Energy Physics preprint index at Stanford Linear Accelerator, California.
(This is the same information avialable via the QSPIRES facility on BITNET.
Include the word "FIND" as the first keyword, eg: K FIND AUTHOR FRED.).
<DT><A PART="part16" NAME=desy>DESY documents</a>
<DD>Documents and help files from the DESY lab in Hamburg.
<DT><A PART="part17" NAME=archie>
Archie</a>
<DD>An index of almost everything available by "anonymous FTP".
<DT><A PART="part18" NAME=7>Hacker Jargon</a>
<DD>An index to a cross-referenced set of hacker terms. A demonstration
of the WWW gateway to the Graz Technical University Hyper-G database.
<DT><A PART="part19" NAME=9>W.A.I.S.</a>
<DD>All kinds of information available from "Wide Area Information Servers".
<DT><A PART="part20" NAME=6>CERN RPC</A>
<DD>The user guide for the RPC system developed in CERN CN division
(not Sun/RPC). This is an example of documentation (partially) converted
into hypertext.
<DT><A PART="part21" NAME=hut>Helsinki</a>
<DD>Helsinki Technical University information service (Mostly Finnish).
<DT><A PART="part22" NAME=gopher>Gophers</a>
<DD>Campus-wide information systems using "Gopher" software. (Requires www version 1.1 or higher)
</DL>
(This page may be an out of date copy. See the
<A PART="part23" NAME=latest>latest version</a>.)

--cut-here
Content-id: QuickGuide.html
Content-type: message/external-body
	;access-type=x-relative
	;name="QuickGuide.html"

Content-Type: message


--cut-here
Content-id: http://info.cern.ch/hypertext/WWW/TheProject.html
Content-type: message/external-body
	;access-type=x-HTTP
	;site=info.cern.ch
	;name=/hypertext/WWW/TheProject.html

Content-Type: message


--cut-here
Content-id: http://crnvmc.cern.ch./WHO
Content-type: message/external-body
	;access-type=x-HTTP
	;site=crnvmc.cern.ch.
	;name=/WHO

Content-Type: message


--cut-here
Content-id: http://crnvmc.cern.ch./FIND/yellow?
Content-type: message/external-body
	;access-type=x-HTTP
	;site=crnvmc.cern.ch.
	;name=/FIND/yellow?

Content-Type: message


--cut-here
Content-id: http://crnvmc.cern.ch./FIND/jaune?
Content-type: message/external-body
	;access-type=x-HTTP
	;site=crnvmc.cern.ch.
	;name=/FIND/jaune?

Content-Type: message


--cut-here
Content-id: http://crnvmc.cern.ch./FIND
Content-type: message/external-body
	;access-type=x-HTTP
	;site=crnvmc.cern.ch.
	;name=/FIND

Content-Type: message


--cut-here
Content-id: http://crnvmc.cern.ch/NEWS/?
Content-type: message/external-body
	;access-type=x-HTTP
	;site=crnvmc.cern.ch
	;name=/NEWS/?

Content-Type: message


--cut-here
Content-id: http://crnvmc.cern.ch./NEWS/cern
Content-type: message/external-body
	;access-type=x-HTTP
	;site=crnvmc.cern.ch.
	;name=/NEWS/cern

Content-Type: message


--cut-here
Content-id: http://crnvmc.cern.ch./NEWS/vmnews
Content-type: message/external-body
	;access-type=x-HTTP
	;site=crnvmc.cern.ch.
	;name=/NEWS/vmnews

Content-Type: message


--cut-here
Content-id: http://crnvmc.cern.ch/NEWS/student
Content-type: message/external-body
	;access-type=x-HTTP
	;site=crnvmc.cern.ch
	;name=/NEWS/student

Content-Type: message


--cut-here
Content-id: http://info.cern.ch/hypertext/DataSources/NewsFromVM/Overview.html
Content-type: message/external-body
	;access-type=x-HTTP
	;site=info.cern.ch
	;name=/hypertext/DataSources/NewsFromVM/Overview.html

Content-Type: message


--cut-here
Content-id: http://info.cern.ch/hypertext/DataSources/News/Overview.html
Content-type: message/external-body
	;access-type=x-HTTP
	;site=info.cern.ch
	;name=/hypertext/DataSources/News/Overview.html

Content-Type: message


--cut-here
Content-id: http://info.cern.ch/hypertext/DataSources/bySubject/Overview.html
Content-type: message/external-body
	;access-type=x-HTTP
	;site=info.cern.ch
	;name=/hypertext/DataSources/bySubject/Overview.html

Content-Type: message


--cut-here
Content-id: http://info.cern.ch./hypertext/DataSources/Overview.html
Content-type: message/external-body
	;access-type=x-HTTP
	;site=info.cern.ch.
	;name=/hypertext/DataSources/Overview.html

Content-Type: message


--cut-here
Content-id: http://slacvm.slac.stanford.edu./FIND/spires
Content-type: message/external-body
	;access-type=x-HTTP
	;site=slacvm.slac.stanford.edu.
	;name=/FIND/spires

Content-Type: message


--cut-here
Content-id: http://crnvmc.cern.ch/FIND/DESY?
Content-type: message/external-body
	;access-type=x-HTTP
	;site=crnvmc.cern.ch
	;name=/FIND/DESY?

Content-Type: message


--cut-here
Content-id: http://info.cern.ch:8001/archive.orst.edu:9000/archie-orst.edu
Content-type: message/external-body
	;access-type=x-HTTP
	;site=info.cern.ch
	;port=8001
	;name=/archive.orst.edu:9000/archie-orst.edu

Content-Type: message


--cut-here
Content-id: http://iicm.tu-graz.ac.at./jargon
Content-type: message/external-body
	;access-type=x-HTTP
	;site=iicm.tu-graz.ac.at.
	;name=/jargon

Content-Type: message


--cut-here
Content-id: http://info.cern.ch./hypertext/Products/WAIS/Sources/Overview.html
Content-type: message/external-body
	;access-type=x-HTTP
	;site=info.cern.ch.
	;name=/hypertext/Products/WAIS/Sources/Overview.html

Content-Type: message


--cut-here
Content-id: http://info.cern.ch/rpc/doc/User/UserGuide.html
Content-type: message/external-body
	;access-type=x-HTTP
	;site=info.cern.ch
	;name=/rpc/doc/User/UserGuide.html

Content-Type: message


--cut-here
Content-id: http://otax.tky.hut.fi/tky/default.html
Content-type: message/external-body
	;access-type=x-HTTP
	;site=otax.tky.hut.fi
	;name=/tky/default.html

Content-Type: message


--cut-here
Content-id: gopher://gopher.micro.umn.edu:70/11/Other%20Gopher%20and%20Information%20Servers
Content-type: message/external-body
	;access-type=x-gopher
	;site=gopher.micro.umn.edu
	;port=70
	;type=11
	;selector="Other Gopher and Information Servers"

Content-Type: message


--cut-here
Content-id: http://info.cern.ch./hypertext/WWW/LineMode/Defaults/default.html
Content-type: message/external-body
	;access-type=x-HTTP
	;site=info.cern.ch.
	;name=/hypertext/WWW/LineMode/Defaults/default.html

Content-Type: message
--cut-here--

---

Here's the perl script I used to convert default.html into
the above message. It's full of gross hacks, but it worked
this evening.

---

#!/usr/local/bin/perl

print "Content-Type: multipart/x-HTDOC; boundary=cut-here\n\n";
print "--cut-here\n";
print "Content-Type: text/x-HTML\n\n";
print "<!DOCTYPE HTML SYSTEM \n[\n";

$o = 0;
$/ = ">";

while(<>){
    s/(<A[^>]*>)/&fix_anchor($1)/ige;
    s/<NEXTID\s*(\d*)\s*>/<NEXTID N=$1>/g;
    if(/<H(\d)/){
	local($n) = $1;
	if($n>$o) { $rep = "<SECTION>"; }
	else { $rep = "</SECTION><SECTION>"; }
        s/(<H\d)/$rep$1/g;
	$o = $n;
    }
    $doc .= $_;
}

@entities = @anchors;
while(@entities){
    local($id) = shift(@entities);
    local($_) = shift(@entities);
    local($name) = shift(@entities);
    local($type) = shift(@entities);

    print "<!ENTITY part$id SDATA \"$_\">\n";
}

print "]>\n", $doc;

while(@anchors){
    local($id) = shift(@anchors);
    local($_) = shift(@anchors);
    local($name) = shift(@anchors);
    local($type) = shift(@anchors);
    local($access_type);

    print "\n\n--cut-here\n";
    print "Content-id: $_\n";
    print "Content-type: message/external-body\n";

    $access_type = $1 if s/^(\w+)://;
    if(s/#([^#]+)$//){
	print "\t;x-element-id=\"$1\"\n";
    }

    if($access_type =~ /file/i){
	print "\t;access-type=LOCAL-FILE\n";
	print "\t;name=$_\n";
    }elsif($access_type =~ /http/i){
	print "\t;access-type=x-HTTP\n";
	if(s-//([^:/]+)--){
	    print "\t;site=$1\n";
	    print "\t;port=$1\n" if s/^:(\d+)//;
	}
	&unescape;
	print "\t;name=$_\n";
    }elsif($access_type =~ /news/i){
	print "\t;access-type=x-news\n";
	&unescape;
	if(/@/){
	    print "\t;message-id=$_\n";
	}else{
	    print "\t;group=$_\n";
	}
    }elsif($access_type =~ /telnet/i){
	print "\t;access-type=x-telnet\n";
	&unescape;
	print "\t;user=$1\n" if s/^(.*)@//;
	print "\t;port=$1\n" if s/:(.*)$//;
	print "\t;site=$_\n";
    }elsif($access_type =~ /gopher/i){
	print "\t;access-type=x-gopher\n";
	if(s-^//([^:/]+)--){
	    print "\t;site=$1\n";
	    print "\t;port=$1\n" if s/:(\d+)//;
	}
	print "\t;type=$1\n" if s-^/(\d+)/--;
	&unescape;
	print "\t;selector=\"$_\"\n";
    }elsif($access_type =~ /wais/i){
	print "\t;access-type=x-wais\n";
	if(s-//([^:/]+)--){
	    print "\t;site=$1\n";
	    print "\t;port=$1\n" if s/:(\d+)//;
	}
	if(m-^/-){
	    print "\t;type=$1\n" if s-^/(\w+)--;
	    print "\t;size=$1\n" if s-^/(\d+)--;
	    &unescape;
	    print "\t;path=\"$_\"\n";
	}else{
	    &unescape;
	    print "\t;words=\"$1\"\n" if /\?(.*)/;
	}
    }elsif($access_type eq ""){
	print "\t;access-type=x-relative\n";
	&unescape;
	print "\t;name=\"$_\"\n";
    }else{
	warn "unknown access type: $access_type in $_";
    }

    print "\nContent-Type: message\n";
}

print "--cut-here--\n";

sub unescape{
    s/%(\w\w)/sprintf("%c",hex($1))/ge;
}

sub fix_anchor{
    local($_) = @_;
    local($name, $href, $type);
    $href = $1 if /HREF\s*=\s*(\S+)/i;
    return $_ unless $href;
    $href =~ s/>$//;

    $name = $1 if /NAME\s*=\s*(\S+)/i;
    $type = $1 if /TYPE\s*=\s*(\S+)/i;

    $content_id{$href} = $content_id++ unless $content_id{$href};
    push(@anchors, $content_id, $href, $name, $type);
    local($ret) = "<A PART=\"part$content_id\"";
    $ret .= " NAME=$name" if $name;
    $ret .= ">";
    return $ret;
}

-----

From connolly@pixel.convex.com  Sun Jun  7 06:37:07 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA09218; Sun, 7 Jun 92 06:37:07 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA07900; Sun, 7 Jun 92 06:35:09 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA11346; Sat, 6 Jun 92 23:34:53 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA08877; Sat, 6 Jun 92 23:34:50 -0500
Message-Id: <9206070434.AA08877@pixel.convex.com>
To: www-talk@nxoc01.cern.ch
Subject: <ISINDEX> and anchor types
Date: Sat, 06 Jun 92 23:34:49 CDT
From: Dan Connolly <connolly@pixel.convex.com>

Just a note: I prefer gopher's model of interaction when it
comes to index nodes. Rather than having a special <ISINDEX>
tag and a "find" command, have a different anchor type. The
user first chooses the "search this index" anchor; then, to
follow an index anchor link, the client prompts for the
query details and makes the query.

Seems cleaner, simpler, and more flexible.

Dan

From connolly@pixel.convex.com  Sun Jun  7 07:15:06 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA09233; Sun, 7 Jun 92 07:15:06 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA10393; Sun, 7 Jun 92 07:13:11 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA12453; Sun, 7 Jun 92 00:12:57 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA17832; Sun, 7 Jun 92 00:12:55 -0500
Message-Id: <9206070512.AA17832@pixel.convex.com>
To: www-talk@nxoc01.cern.ch
Subject: HTML is not SMGL
Date: Sun, 07 Jun 92 00:12:55 CDT
From: Dan Connolly <connolly@pixel.convex.com>

My grandiose scheme to convert HTML to MIME and SGML
works fine.

Now I'm going back to the idea of writing a DTD for
the existing HTML format. I can't seem to do it.
HTML has so little rigid structure that I'm running
into mixed content problems (I have to allow #PCDATA
almost anywhere, hence mixed content, which screws
up everything).

How much extant HTML is really out there? And how
much of it is generated on the fly by gateways
and servers?

This MIME/SGML stuff sure seems like the way to go.

Now if I make it possible to create such documents
with FrameMaker and a perl script, I bet it will
catch on. I suspect I'll get some resistance against
abandoning UDI's, but I don't think they work.

Dan

From connolly@pixel.convex.com  Sun Jun  7 08:20:25 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA09268; Sun, 7 Jun 92 08:20:25 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA14076; Sun, 7 Jun 92 08:18:31 +0200
Received: from convex-inet.convex.com by mcsun.EU.net with SMTP
	id AA06010 (5.65b/CWI-2.167); Sun, 7 Jun 1992 08:18:19 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA14512; Sun, 7 Jun 92 01:15:38 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA15478; Sun, 7 Jun 92 01:15:36 -0500
Message-Id: <9206070615.AA15478@pixel.convex.com>
To: www-talk@nxoc01.cern.ch
Subject: overkill on portability macros
Date: Sun, 07 Jun 92 01:15:34 CDT
From: Dan Connolly <connolly@pixel.convex.com>

Are these abuses of the preprocessor really necessary?

----
PRIVATE char from_hex ARGS1(char, c)
{
}
----
Ok, I've seen PRIVATE before (though I don't know what it's
for. Some sort of MS DOS near/far thing?)

But ANSI C and PCC share syntax for _defining_ functions.
The preprocessor dancing is necessary for _declaring_ functions
like so:

int foo __ARGS__((int x, int y, int z));

but in the .c files, you can just do the usual

int foo(x,y,z)
int x;
int y;
int z;

and ANSI an PCC compilers alike will be happy, with one
exception: varargs. Functions with

int foo(int x, ...);

style declarations need corresponding

int foo(int x, ...)
{
}

style definitions.

Dan

From jfg@dxcern.cern.ch  Mon Jun  8 00:43:17 1992
Return-Path: <jfg@dxcern.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA10106; Mon, 8 Jun 92 00:43:17 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA26188; Mon, 8 Jun 92 00:41:24 +0200
Received: by dxcern.cern.ch (5.57/Ultrix3.0-C)
	id AA24685; Mon, 8 Jun 92 00:41:18 +0200
Date: Mon, 8 Jun 92 00:41:18 +0200
From: jfg@dxcern.cern.ch (Jean Francois Groff)
Message-Id: <9206072241.AA24685@dxcern.cern.ch>
To: www-talk@nxoc01.cern.ch
Subject: Re: revised MIME architecture
References: <9206062231.AA23867@pixel.convex.com>

	Dan,

  OOPS... You sent this one to www-interest instead of www-talk...
I'm getting ready for some flamage from clueless list members :-)

  Regarding your proposal, the idea is exciting, although I find it
annoying that small documents, like most HTML pages are, will suddenly
take 4 times as much storage and bandwidth due to verbosity. When
multimedia comes into play, of course, this is a non-issue. I do agree
that the current UDI syntax is too terse, though.

  I'd like to develop on this, but I lack MIME background and, believe
it or not, I couldn't find the MIME RFC in the archive indexes ; could
you tell me (and the list) its number? Also, pointers to existing MIME
readers would be welcome.

  More later...

	Jean-Francois

From jfg@dxcern.cern.ch  Mon Jun  8 01:03:03 1992
Return-Path: <jfg@dxcern.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA10129; Mon, 8 Jun 92 01:03:03 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA27827; Mon, 8 Jun 92 01:01:09 +0200
Received: by dxcern.cern.ch (5.57/Ultrix3.0-C)
	id AA26164; Mon, 8 Jun 92 01:01:02 +0200
Date: Mon, 8 Jun 92 01:01:02 +0200
From: jfg@dxcern.cern.ch (Jean Francois Groff)
Message-Id: <9206072301.AA26164@dxcern.cern.ch>
To: www-talk@nxoc01.cern.ch
Subject: Re: HTML is not SMGL
References: <9206070512.AA17832@pixel.convex.com>

Dan asked:
> How much extant HTML is really out there? And how much of it is
> generated on the fly by gateways and servers?

  Our hypertext documentation is certainly the largest quantity of
HTML you can find in the world. Besides, we know all the people who
have produced their own, so making the Big Change would be relatively
simple for them (esp. given your impressive perl script). Gateways can
be changed easily too. But all the browsers must be updated before,
and that will take more time !!!  (There are thousands of copies
installed...)

> I suspect I'll get some resistance against abandoning UDI's, but I
> don't think they work.

  Well, you still use them internally, don't you ? ;^)

	Jean-Francois

From emv@nigel.msen.com  Mon Jun  8 02:29:05 1992
Return-Path: <emv@nigel.msen.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA10171; Mon, 8 Jun 92 02:29:05 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA03499; Mon, 8 Jun 92 02:27:12 +0200
Received: by nigel.msen.com (/\==/\ Smail3.1.25.1 #25.5)
	id <m0luXZf-0009YoC@nigel.msen.com>; Sun, 7 Jun 92 20:26 WET DST
Message-Id: <m0luXZf-0009YoC@nigel.msen.com>
To: jfg@dxcern.cern.ch (Jean Francois Groff)
Cc: www-talk@nxoc01.cern.ch
Subject: Re: HTML is not SMGL 
In-Reply-To: Your message of Mon, 08 Jun 92 01:01:02.
             <9206072301.AA26164@dxcern.cern.ch> 
Date: Sun, 07 Jun 92 20:26:48 EDT
From: Edward Vielmetti <emv@msen.com>

The UDI vs. MIME argument is a non-arguement.  MIME is sufficiently
flexible that if you construct an appropriate Content-type and define
its semantics appropriately it will accept UDI's and work accordingly.
"Simple matter of programming" :).

Explicit "attribute=value" tags are more flexible than the W3 approach
to turn the entire document ID into a big long string.  I guess it 
depends on whether you believe you are dealing with a big database
or a big file system.  Both approaches have their place.  Again as
a simplified case you have "udi=//host:port/path" as a MIME identifier
and all is well.

I expect that MIME will be available in many e-mail products over the next
3-5 years.  Since the only application that has anywhere near universal
appeal on the net is e-mail, it strikes me as only appropriate that 
hypertext systems try to get as much leverage from mail as they possibly
can.

--Ed

From connolly@pixel.convex.com  Mon Jun  8 05:32:01 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA10306; Mon, 8 Jun 92 05:32:01 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA28567; Mon, 8 Jun 92 05:30:05 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA12532; Sun, 7 Jun 92 22:29:47 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA28300; Sun, 7 Jun 92 22:29:45 -0500
Message-Id: <9206080329.AA28300@pixel.convex.com>
To: Edward Vielmetti <emv@msen.com>
Cc: jfg@dxcern.cern.ch (Jean Francois Groff), www-talk@nxoc01.cern.ch
Subject: Re: HTML is not SMGL 
In-Reply-To: Your message of "Sun, 07 Jun 92 20:26:48 EDT."
             <m0luXZf-0009YoC@nigel.msen.com> 
Date: Sun, 07 Jun 92 22:29:44 CDT
From: Dan Connolly <connolly@pixel.convex.com>


>The UDI vs. MIME argument is a non-arguement.  MIME is sufficiently
>flexible that if you construct an appropriate Content-type and define
>its semantics appropriately it will accept UDI's and work accordingly.
>"Simple matter of programming" :).
>
>Explicit "attribute=value" tags are more flexible than the W3 approach
>to turn the entire document ID into a big long string.  I guess it 
>depends on whether you believe you are dealing with a big database
>or a big file system.  Both approaches have their place.  Again as
>a simplified case you have "udi=//host:port/path" as a MIME identifier
>and all is well.
>
The problems is that the syntax of a UDI doesn't fit into the syntax
of a MIME parameter (or an SGML attribute value...) because a UDI
might be arbitrarily long, and it cannot contain any whitespace (so
it can't be split across lines).

So these 200+ character UDI's for WAIS documents can't be
mailed around safely (even SGML has limits on the length of an
attribute value).

Heck, my WWW client (perhaps it's not the latest version, but still...)
can't even retrieve wais documents due to these problems.

Dan

From connolly@pixel.convex.com  Mon Jun  8 05:39:39 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA10313; Mon, 8 Jun 92 05:39:39 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA00160; Mon, 8 Jun 92 05:37:46 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA12756; Sun, 7 Jun 92 22:37:31 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA28980; Sun, 7 Jun 92 22:37:29 -0500
Message-Id: <9206080337.AA28980@pixel.convex.com>
To: jfg@dxcern.cern.ch (Jean Francois Groff)
Cc: www-talk@nxoc01.cern.ch
Subject: Re: revised MIME architecture 
In-Reply-To: Your message of "Mon, 08 Jun 92 00:41:18 +0200."
             <9206072241.AA24685@dxcern.cern.ch> 
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary=cut-here
Date: Sun, 07 Jun 92 22:37:29 CDT
From: Dan Connolly <connolly@pixel.convex.com>


--cut-here

>	Dan,
>
>  OOPS... You sent this one to www-interest instead of www-talk...
>I'm getting ready for some flamage from clueless list members :-)
>
Did I? I sent several messages, and I though I was careful to
send them all to www-talk. Oh well... Sorry!

>  Regarding your proposal, the idea is exciting, although I find it
>annoying that small documents, like most HTML pages are, will suddenly
>take 4 times as much storage and bandwidth due to verbosity.
>
They won't necessarily take 4 times as much storage: the verbose part
_could_ be generated on the fly. by the server.

On the other hand, it _is_ simpler to store things in the format that
they'll be used. This brings up the issue of authoring documents in
this format. I'm developing some ideas on interactive composition
of MIME messages (using EMACS, FrameMaker, or perhaps a Tk application)
but for now, it's somewhat tedious.

>  I'd like to develop on this, but I lack MIME background and, believe
>it or not, I couldn't find the MIME RFC in the archive indexes ; could
>you tell me (and the list) its number? Also, pointers to existing MIME
>readers would be welcome.
>
I've gotten several "nifty... so what is MIME?" responses. So I'll
tell you what I told them...

--cut-here

>Where can I get more information about MIME?
>
>			Thanks.

The author of the rfc is Nathaniel Borenstein <nsb@thumper.bellcore.com>.
Most of the publicly available material is on thumper.bellcore.com in
pub/nsb.

--cut-here
Content-Description: RFC-XXXX (MIME)
Content-Type: multipart/alternative; boundary=alt

--alt
Content-Type: message/external-body;
	access-type=ANON-FTP;
	site=thumper.bellcore.com;
	dir=pub/nsb;
	name="BodyFormats.txt"

Content-Type: text/plain

--alt
Content-Type: message/external-body;
	access-type=ANON-FTP;
	site=thumper.bellcore.com;
	dir=pub/nsb;
	name="BodyFormats.ps"

Content-Type: application/postscript

--alt--

--cut-here--


From connolly@pixel.convex.com  Mon Jun  8 05:52:19 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA10330; Mon, 8 Jun 92 05:52:19 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA00534; Mon, 8 Jun 92 05:50:21 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA13067; Sun, 7 Jun 92 22:49:54 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA00415; Sun, 7 Jun 92 22:49:51 -0500
Message-Id: <9206080349.AA00415@pixel.convex.com>
Subject: MIME for global hypertext
To: www-talk@nxoc01.cern.ch, wais-talk@think.com
Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA
Date: Sun, 07 Jun 92 22:49:51 CDT
From: Dan Connolly <connolly@pixel.convex.com>


[This was posted to several newsgroups, but someone from wais-talk
suggest I forward it there also.]


The WAIS, gopher, and world-wide-web projects are all client/server
information retrieval systems. All three deliver plain text information
quite well, and they each have evolving mechanisms for delivering
other forms of information.

The MIME RFC defines a system for processing multi-part, multimedia
messages on the internet. I would like to see these systems, along
with USENET news and internet mail, interoperate with MIME as the substrate.

The clients for these systems go something like this:
0	user invokes client (and chooses a starting point)
1	client displays user's request
2	user reads page, chooses a reference to more info
3	user informs client of choice
		 (e.g. "show me item #1," or "search for googoo")
4	go to step 1

These systems often consist of a hierarchy of menus with text files at
the leaf nodes. The system allows the user to interactively navigate
the menus and browse leaf nodes. But 1) the format of the menus is
particular to the system (USENET newsgroups/articles, unix
directories/files, WAIS source/database/document). And 2) once a user
is at a leaf node, the system can no longer interactively follow
references.

The novel aspect of hypertext is that the distinction between the
menu pages and the text pages disappears. In the world-wide-web,
text documents have machine-readable links inside them, and all
menus are represented as hypertext documents.

The WWW format works well, but it would benefit from use of MIME's
features.

For a common hypertext document format, I propose we define a
subtype of the MIME multipart message: X-HYPERTEXT. The first
part of a multipart/X-HYPERTEXT message is the content of
the document, and the remaining parts are multimedia attachments
and links to other documents.

The content part contains references (by Content-ID) to the
attachments and links. The client software allows the user
to interactively choose references to display/follow.

The remaining parts may be attached image/audio/video using
MIME's various types and transfer encodings (text attachments
would work too) or they may be references to information
accessible elsewhere using MIME's message/external-body type.
The parameters to the external-body content-type provide the
same information as WWW's Universal Document Indentifier.
(MIME only defines ANON-FTP, FTP, TFTP, LOCAL-FILE and AFS.
The remaining access-types (WAIS, gopher, etc) would be
experimental (X-WAIS, X-GOPHER) until standardized.)

The emerging standard for structured, platform-independent text
is SGML. The WWW project defines an SGML document type with
traditional elements (title, heading, paragraph, list) and
new hypertext elements (anchor). Soon it will have multimedia
elements (image, audio).

The current design places external document references (to files,
WWW servers, WAIS documents, gophers, etc.) inside the SGML as
attributes. There are lexical incompatibilities, and the design
is under strain. I suggest that we implement references as
as SGML entities that identify message/external-body parts
by content-id.

Representing document content in SGML allows the same information
to be accessed using different user interface paradigms (e.g. dumb
terminals vs. curses style vs. x windows point-and-click).

Short of full SGML parsing, we could adopt the MIME text/richtext
format, with the addition of a <REF ID="xxx">...</REF> tag.
In fact, any representation that allows the user to interactively indicate
one of the attached body parts by content-id will do. For example,
plain text with one-line descriptions would do. The Andrew ez
data stream would also work, but only Andrew sites could parse it.

This brings up the issue of format negociation. No one format is
optimal for all information. Clients are likely to be able to process
information in several formats, and servers are likely to be able
to provide different representations.

The various formats can be enclosed in a MIME multipart/alternative
message. And rather than including the data for all formats in
the message, the data could be in message/external-body parts. The
client chooses the type of data it likes and retrieves the corresponding
external-body. This (modified) example from the MIME rfc may help explain:

MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=42

--42
Content-Type: message/external-body;
	name="BodyFormats.ps";
	site="thumper.bellcore.com";
	access-type=ANON-FTP;
	directory="pub";
	mode="image";

Content-type: application/postscript

--42
Content-Type: message/external-body;
	name="/u/nsb/writing/rfcs/RFC-XXXX.ez";
	site="thumper.bellcore.com";
	access-type=AFS;

Content-type: application/x-ez

--42
Content-Type: message/external-body;
	name="BodyFormats.txt";
	site="thumper.bellcore.com";
	access-type=ANON-FTP;
	directory="pub";

Content-type: text/plain

--42--

The client can choose between postscript, ez, and plain text, and
retrieve the corresponding message body.


The question then becomes: how do these systems interoperate?
By making information available as multipart/X-HYPERTEXT MIME
messages.

The WWW client interfaced to the other systems by defining
"addressing schemes" and implementing the various protocols
and translating the data into HTML. Gopher has a similar
typing scheme -- one character is reserved to indicate
the access type and the data type. WAIS clients have yet
another method of resolving types, though they only support
one protocol. The NewsGrazer application has its own
encapsulation mechanism. This is becoming a mess.

In the short term, global hypertext viewers will have to support
the access-type and content-type of each system with which it
interoperates (so we have X-WAIS, X-HTTP, X-GOPHER, X-NNTP, as well as
X-WAIS-SRC, X-HTML, X-GOPHER-1 thru X-GOPHER-9).

Some of the access types will become standard, and some will die out.
But all the data types should be encapsulated in MIME messages. Any
data that has machine-readable pointers to other data should be made
into a multipart/X-HYPERTEXT message. For example, a WAIS question
should have attachments for each of the result documents (the content
part can stay application/x-wais-question, or it could be converted to
a text type, or both), at least in the case where those documents are
available by some standard access method.  [I wrote a perl script that
will change an HTML document into a MIME message with attachments.]

Leaf documents, i.e. documents with no external links, can stay in
single part types. e.g. Plain text files become MIME messages by simply
adding a blank line at the beginning (to separate the headers (none)
from the body).

Under this model, a mail message can point to a news article
which references a WAIS document which contains several drawings
and pointers to several more available by FTP, and a user could
just point-and-click between them. The only need for
protocols like gopher and HTTP is to encapsulate data that's not
already MIME compliant.

This is clearly a pipe dream, but it's the kind of thing we can work
towards today.

Dan


From connolly@pixel.convex.com  Mon Jun  8 07:20:13 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA10475; Mon, 8 Jun 92 07:20:13 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA07520; Mon, 8 Jun 92 07:18:19 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA16259; Mon, 8 Jun 92 00:17:52 -0500
Received: by pixel.convex.com (5.64/1.28)
	id AA20948; Mon, 8 Jun 92 00:17:48 -0500
Date: Mon, 8 Jun 92 00:17:48 -0500
From: connolly@pixel.convex.com (Dan Connolly)
Message-Id: <9206080517.AA20948@pixel.convex.com>
To: www-talk@nxoc01.cern.ch
Cc: enag@ifi.uio.no
Subject: Re: using NOTATIONs inline
Newsgroups: comp.text.sgml
In-Reply-To: <23177A@erik.naggum.no>
References: <1992Jun5.205639.21823@news.eng.convex.com>
Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA
Cc: 

In article <23177A@erik.naggum.no> you write:
>Dan Connolly <connolly@convex.com> writes:
>|
>|   The WWW group is attempting to define a multimedia interchange
>|   format called HTML.  . . .
>
>Why not use HyTime?
>
Eric:
Partyly because of ignorance (we've heard of HyTime, but we don't
know the details). I'd expect a HYTIME engine to be quite a bit
of work to implement. And partly because, as I understand it, HYTIME
doesn't go as far as to perscribe a DTD. The WWW project needs
one particluar language, not a whole architecture.

I'd certainly like to know more about HYTIME's techniques for addressing
documents, esp. elements of documents.

Now for the WWW gang:
>:
>|   That is, is it possible to put an arbitrary 8 bit binary stream
>|   _inside_ an SGML document? My guess is: no. But if we use
>|   CDATA, can we include anything that doesn't contain the closing
>|   tag in full?
>
>If you by "the closing tag in full" mean the entire end-tag, complete
>with etago, generic identifier, and tagc, as in "</image>", this is not
>the way SGML does it.  CDATA and SDATA are terminated by a etago
>"delimiter-in-context", which is an etago (end-tag open, "</") delimiter
>followed by a name start character, or a grpo (group open, "(")
>delimiter if concurrent document types are allowed.  In the reference
>concrete syntax, this means that the regular expression "</[(a-z]"
>matches the end of CDATA and SDATA elements.
>
>You can also use marked sections, with a CDATA status keyword, in which
>case the CDATA is terminated by the mse delimiter (marked section end,
>"]]>").
>
>:
>|   Someone made the point that an SGML document is only allowed to
>|   include SGML characters as specified by the SGML declaration, and if
>|   we're going to use the default SGML declaration, we have to stick to
>|   the characters blessed by it.
>
>Blessed and blessed.  The SGML declaration is supposed to reflect the
>reality of the document, not enforce arbitrary limits on them.  So you
>write an SGML declaration which fits the document.
>
>|   That's not my understanding. I thought that inside CDATA (or SDATA,
>|   I think) you could put _anything_ but the closing tag in full.
>
>As said above, the etago delimiter-in-context terminates the data,
>regardless of whether it's a legal end-tag in that context.
>
>You should be aware that the SGML parser will parse the contents of the
>"binary" content, and ignore record start, and treat record ends
>different from other characters.  In addition, it's an error for an SGML
>entity to contain characters with any of the numbers listed in the
>SHUNCHAR part of the SYNTAX declaration.  This is _not_ what you want
>with binary data.
>
>|   What's the scoop? Do we have to use external entities for raw data?
>
>Yes.  An external entity that is not an SGML text entity requires a
>notation identifier, so you only need to list the entities in the DTD,
>with notation, and refer to them by name in the document instance.
>
>If this is not satisfactory, you should declare the objects to be CDATA,
>and use a binary to text-only transformation scheme.  There are several
>such schemes.  Among them, base64 is the preferred encoding in my view,
>since it's available as part of the new Multipurpose Internet Mail
>Extensions (MIME) RFC-to-be.  (The latest draft is available for
>anonymous FTP as ftp.ifi.uio.no:/pub/SGML/MIME.6.ps and MIME.6.txt for
>two weeks from today.  Section 5.2 which concerns the base64 encoding is
>also available as ftp.ifi.uio.no:/pub/SGML/base64.txt.)  Transformation
>back to the binary form from the text-only form may be done on the fly
>by the application before sending the data to the notation interpreter.
>
My idea is to use MIME encodings, but put these attachments _outside_
the SGML text, in an attached (or external) body part.

>In addition to being much easier to deal with in SGML, this also makes
>SGML documents containing such content robust with respect to file
>transfer, etc.
>
>Hope this helps,
></Erik>

Thanks. Mostly it confirms my suspicions, but it should also provide
a somewhat authoritative answer (no references to ISO 8879 here :-)
to the WWW project.

>--
>Erik Naggum       |  +47-295-0313     |  ISO 8879 SGML     |  Memento,
>Naggum Software   |   "fuzzface"      |  ISO 10744 HyTime  |  terrigena.
>Boks 1570, Vika   | <erik@naggum.no>  |  JTC 1/SC 18/WG 8  |  Memento,
>0118 OSLO, NORWAY | <enag@ifi.uio.no> |  SGML UG SIGhyper  |  vita brevis.



From davis@willow.tc.cornell.edu  Mon Jun  8 15:28:24 1992
Return-Path: <davis@willow.tc.cornell.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA11057; Mon, 8 Jun 92 15:28:24 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA20617; Mon, 8 Jun 92 15:26:33 +0200
Received: by willow.tc.cornell.edu (4.1/SMI-4.1)
	id AA17632; Mon, 8 Jun 92 09:28:20 EDT
Date: Mon, 8 Jun 92 09:28:20 EDT
From: davis@willow.tc.cornell.edu (Jim Davis)
Message-Id: <9206081328.AA17632@willow.tc.cornell.edu>
To: www-talk@nxoc01.cern.ch
Subject: HTML terseness/verbosity

Re the recent comments on terseness of UDIs and the
extra verbosity in Dan Connolly's proposal to
use Mime for WWW documents:

My understanding is that nobody should have to type
"naked" SGML (or HTML or Mime-language) anyway.
We should have programs like WYSIWYG editors
manipulating the markup for us.  (Now of course
at present we do have to type HTML, at least I do
here, but hopefully this will not persist).  If
that's right, then the more explicit and simple
the document structure is, the easier to parse
and manipulate by programs, the better we are.

One thing I like about Dan's proposal - it makes
it possible to collect a hyperdocument into a single
file (by embedding the docs within one mime file)
which will make transporting easier

From jfg@dxcern.cern.ch  Mon Jun  8 17:34:55 1992
Return-Path: <jfg@dxcern.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA11440; Mon, 8 Jun 92 17:34:55 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA01254; Mon, 8 Jun 92 17:32:28 +0200
Received: by dxcern.cern.ch (5.57/Ultrix3.0-C)
	id AA21936; Mon, 8 Jun 92 17:32:23 +0200
Date: Mon, 8 Jun 92 17:32:23 +0200
From: jfg@dxcern.cern.ch (Jean Francois Groff)
Message-Id: <9206081532.AA21936@dxcern.cern.ch>
To: Dan Connolly <connolly@pixel.convex.com>
Cc: www-talk@nxoc01.cern.ch
Subject: Re: overkill on portability macros
References: <9206070615.AA15478@pixel.convex.com>

> But ANSI C and PCC share syntax for _defining_ functions.
> The preprocessor dancing is necessary for _declaring_ functions
> like so:
>
> int foo __ARGS__((int x, int y, int z));
>
> but in the .c files, you can just do the usual
>
> int foo(x,y,z)
> int x;
> int y;
> int z;

  True, but in the latter case you don't get any type checking of the
parameters in functions that happen NOT to be declared before their
definition. I agree that the syntax with extra commas is ugly, though,
but there was no better way.

	JF

From mitra%pandora@fernwood.mpk.ca.us  Mon Jun  8 22:23:03 1992
Return-Path: <mitra%pandora@fernwood.mpk.ca.us>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA12313; Mon, 8 Jun 92 22:23:03 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA24770; Mon, 8 Jun 92 22:21:08 +0200
Received: by fernwood.mpk.ca.us; id AA13324; Mon, 8 Jun 92 13:16:50 -0700
From: mitra@pandora.sf.ca.us ()
X-Mailer: SCO System V Mail (version 3.2)
To: connolly@pixel.convex.com, wais-talk@think.com, www-talk@nxoc01.cern.ch
Subject: MIME for global hypertext
Date: Mon, 8 Jun 92 13:11:15 PDT
Message-Id:  <9206081311.aa26440@pandora.sf.ca.us>

Dan,

Thanks for that proposal. I must admit to not having read the MIME RFC,
being mostly concerned with text rather than multimedia, so I wasnt
aware of the hypertext implications of it.

My question is on a fairly minor point of your document, you mention that 
a MIME document typically consists of a content and then the pointers, 
with the hypertext links being references to the pointers.  In Wais, it 
is quite possible to return part of a document (by byte position), and 
if the pointers are part of the document itself then they may not be 
returned at the time the user chooses to try and follow a link? 

My concerns are around doing these things for users on low-speed (2400 baud)
modems. For them, protocols need to be easy to handle at slow speed, and 
need to be meaningfull BEFORE the whole document has been received. As the
Internet extends out to more and more users beyond the high-speed links
currently assumed the need for protocol designers to consider those users
becomes more important. 

- Mitra
------------------------------------------------------------------
Mitra - technical director, Pandora Systems
mitra@pandora.sf.ca.us


From connolly@pixel.convex.com  Mon Jun  8 22:52:36 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA12426; Mon, 8 Jun 92 22:52:36 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA27128; Mon, 8 Jun 92 22:50:38 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA03927; Mon, 8 Jun 92 15:50:19 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA15921; Mon, 8 Jun 92 15:50:17 -0500
Message-Id: <9206082050.AA15921@pixel.convex.com>
To: mitra@pandora.sf.ca.us ()
Cc: wais-talk@think.com, www-talk@nxoc01.cern.ch
Subject: Re: MIME for global hypertext 
In-Reply-To: Your message of "Mon, 08 Jun 92 13:11:15 PDT."
             <9206081311.aa26440@pandora.sf.ca.us> 
Date: Mon, 08 Jun 92 15:50:17 CDT
From: Dan Connolly <connolly@pixel.convex.com>



>My question is on a fairly minor point of your document, you mention that 
>a MIME document typically consists of a content and then the pointers, 
>with the hypertext links being references to the pointers.

Well, this is not typical, but it's the model I'm proposing for
hypertext. Typically MIME message bodies are either single part
text/image/audio, or multipart. The standard multipart types
are mixed, meaning "show these one after the other," parallel,
meaning "show these at the same time," or alternative, meaning
"these all represnt the same info. Take your pick."

The "content and then list of pointers [or attachments]" model
is my own proposed format for hypertext.

>  In Wais, it 
>is quite possible to return part of a document (by byte position), and 
>if the pointers are part of the document itself then they may not be 
>returned at the time the user chooses to try and follow a link? 
>
I would suggest that the WAIS server interpret the byte positions
as offsets into the content part of the hypertext. So the structure
remains in tact. Byte offsets into a MIME multipart message
don't mean much. Transport systems may mess with the headers and
trailing whitespace on body lines. Line offsets may be meaningful
inside text body parts, as long as none of the lines have to be
split due to line length constraints.

Keep in mind that this multipart structure is only necessary for
hypertext (i.e. contains links) and hypermedia (i.e. contains
multimedia attachments) documents.

Traditional documents can be simple single part bodies. For example,
A plain text file starting with a new-line will be interpreted as
a body part with no headers, which defaults to the type
"text/plain; charset=US-ASCII" ,i.e. plain old text.

>My concerns are around doing these things for users on low-speed (2400 baud)
>modems....

From connolly@pixel.convex.com  Wed Jun 10 01:11:39 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA15414; Wed, 10 Jun 92 01:11:39 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA25378; Wed, 10 Jun 92 01:11:32 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA29288; Tue, 9 Jun 92 18:10:43 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA20418; Tue, 9 Jun 92 18:10:31 -0500
Message-Id: <9206092310.AA20418@pixel.convex.com>
To: mitra@pandora.sf.ca.us ()
Cc: emv@msen.com, wais-talk@quake.think.com, www-talk@nxoc01.cern.ch,
        gopher-news@boombox.micro.umn.edu
Subject: Re: WAIS APIs 
In-Reply-To: Your message of "Tue, 09 Jun 92 10:08:51 PDT."
             <9206091008.aa08978@pandora.sf.ca.us> 
Date: Tue, 09 Jun 92 18:10:29 CDT
From: Dan Connolly <connolly@pixel.convex.com>


There are several issues that WAIS and MIME both face. For some
issues, the systems have different requirements, so different
solutions make sense. But for the issue of typing document
content, I feel stronly that WAIS should adopt MIME semantics.

WAIS defines a :type field in the :document structure. Right
now, it's a string with loosely defined semantics. It's a
simple matter to obsolete the :type field with a :content-type
field with MIME semantics as follows:

	obsolete	MIME compliant
	:type "TEXT"	:content-type "text"
	:type "GIF"	:content-type "image/gif"
	:type "TIFF"	:content-type "image/x-tiff"
	:type "PS"	:content-type "application/postscript"
	:type "WSRC"	:content-type "application/x-wais-source"
	:type "MIME"	:content-type "message"

I believe data served up by existing WAIS servers fits the MIME
content typing system as is. A WAIS client already implements the
semantics of the text, image, audio, video, and application types:
Either you present it, hand it off to something that can,
or punt to a file.

There are other semantics defined by MIME that would be nice
in WAIS clients. But these require that you handle pretty much
the whole MIME syntax. It's not clear that WAIS should adopt
the MIME solutions in these cases.

[I would like to see the world-wide web adopt MIME solutions
for these issues, though.]

In <9206081956.AA03908@cmns.think.com.Think.COM>, Mr. Spero suggests
that WAIS might be expanded to offer data in a number of types. The MIME
solution to this problem is the "multipart/alternative" body type.

Someone else suggested that the WAIS server might return only
a pointer to the data, and the client could retrieve the actual
content. MIME defines a "message/external-body" type for this.

And I suggested that the server might return a complex document
with text, graphics, and pointers to other data. (World Wide
Web servers return text with pointers, and they're trying to
deal with graphics). I suggested a new "multipart/hypertext"
type to handle this.

Resolving these issues is going to take a lot more thought.
Let's keep interoperability in mind while we do it.

Dan

From morris@quake.think.com  Wed Jun 10 07:55:32 1992
Return-Path: <morris@quake.think.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA15757; Wed, 10 Jun 92 07:55:32 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA28784; Wed, 10 Jun 92 07:55:34 +0200
Received: by quake.think.com (4.1/SMI-4.0)
	id AA01284; Tue, 9 Jun 92 22:56:59 PDT
Date: Tue, 9 Jun 92 22:56:59 PDT
Message-Id: <9206100556.AA01284@quake.think.com>
From: Harry Morris <morris@think.com>
Sender: morris@quake.think.com
To: connolly@pixel.convex.com
Cc: mitra@pandora.sf.ca.us, emv@msen.com, wais-talk@quake.think.com,
        www-talk@nxoc01.cern.ch, gopher-news@boombox.micro.umn.edu
In-Reply-To: Dan Connolly's message of Tue, 09 Jun 92 18:10:29 CDT <9206092310.AA20418@pixel.convex.com>
Subject: WAIS APIs 


   Date: Tue, 09 Jun 92 18:10:29 CDT
   From: Dan Connolly <connolly@pixel.convex.com>

	   obsolete	MIME compliant
	   :type "TEXT"	:content-type "text"
	   :type "GIF"	:content-type "image/gif"
	   :type "TIFF"	:content-type "image/x-tiff"
	   :type "PS"	:content-type "application/postscript"
	   :type "WSRC"	:content-type "application/x-wais-source"
	   :type "MIME"	:content-type "message"

excuse the novice question, but doesn't x-tiff and x-wais-source imply that
a particular program (running on an X Windows display) is to be called?

how do you support other platforms?


From burchard@horizon.math.utah.edu  Wed Jun 10 08:09:00 1992
Return-Path: <burchard@horizon.math.utah.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA15796; Wed, 10 Jun 92 08:09:00 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA05143; Wed, 10 Jun 92 08:09:02 +0200
Received: from horizon.math.utah.edu by math.utah.edu (4.1/SMI-4.1-utah-csc-server)
	id AA09200; Wed, 10 Jun 92 00:08:51 MDT
Received: by horizon.math.utah.edu (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA01628; Wed, 10 Jun 92 00:09:54 MDT
Date: Wed, 10 Jun 92 00:09:54 MDT
From: burchard@horizon.math.utah.edu (Paul Burchard)
Message-Id: <9206100609.AA01628@horizon.math.utah.edu>
Received: by NeXT Mailer (1.63)
To: Harry Morris <morris@think.com>
Subject: Re: WAIS APIs 
Cc: connolly@pixel.convex.com, mitra@pandora.sf.ca.us, emv@msen.com,
        wais-talk@quake.think.com, www-talk@nxoc01.cern.ch,
        gopher-news@boombox.micro.umn.edu

> excuse the novice question, but doesn't x-tiff and x-wais-source 

> imply that a particular program (running on an X Windows display) 

> is to be called?

If I'm not mistaken the "x-" refers to "experimental".  I.e,. these  
are not an officially registered MIME types.  (There is no X Windows  
program called x-tiff, by the way.)

Since these naming schemes serve the same purpose there is no reason  
not to have them interoperate; this could be fitted nicely into an  
OOP API.

PB



From timbl@zippy.lcs.mit.edu  Thu Jun 11 18:21:47 1992
Return-Path: <timbl@zippy.lcs.mit.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20470; Thu, 11 Jun 92 18:21:47 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA16880; Thu, 11 Jun 92 18:21:50 +0200
Received: by zippy.lcs.mit.edu 
	id AA03819; Thu, 11 Jun 92 12:22:56 -0400
Date: Thu, 11 Jun 92 12:22:56 -0400
From: timbl@zippy.lcs.mit.edu (Tim Berners-Lee)
Message-Id: <9206111622.AA03819@zippy.lcs.mit.edu>
To: connolly@pixel.convex.com, enag@ifi.uio.no, www-talk@nxoc01.cern.ch
Subject: MIME, SGML, UDIs, HTML and W3
Cc: timbl@zippy.lcs.mit.edu


I have printed off the recent discussion on the new
HTTP, HTML and MIMe and UDIs and done what I can
to disentangle it all in my mind.  I will reply
in one message, becase many of the points are linked.
I know this should be hypertext, with references but
(a) I am away from home and (b) we don't yet have a
universal mail/news archive server running to link to.

	HTTP and HTML

First of all, Jean-Francois <jfg@dxcern.cern.ch>
points out very properly that the enhaced HTTP
protocol and the enhanced HTML spec are quite
separate things, and should be specified separatedly.
I agree wholeheartdly about all this, and
I aplogize for muddling the levels up till now.

(As a small aside, I would point out that wheras a
HTERR file is not very useful, a HTFWD file IS.
It is like a hypertex soft link. But I am happy to
leave that as a separate type of file. It should
certainly get a different extension so that it gets a
different icon)

        HTTP: SGML vs ASN/1

Let's look at the HTTP protocol first. Carl <barker@cernnext.cern.ch>
is mapping out  the requirements for this, and assuming that SGML
would be a reasonable representation for it in practice.
And so it is.  When the requirements are clear,
it would certainly be interesting to look at mapping them
onto a z39.50 - style ASN/1 implementation. This would
be useful for two reasons. First, the comparison would
point out to us things in z39.50 which we might not have thought of
which would b useful for HTTP. Second, the comparison might give
a nice short or at least well-defined things which the WAIS
guys might like to take into account in the next version
of their protocol.  (I demod W3 to Brewster who hadn't
seen it before live, and was very keen that WAIS and W3
should merge, changing the WAIS protocol if necessary.

There is no reason why we shouldn't try both protocols.
If they map well onto each other, its just a question
of having two separate prasers at the low level, building
the same internal structures.

When we're talking about an SGML representation,
and describe a file to come later down the link,
I don't think we have to use the NOTATION= attribute with a notation
type, because we won't in fact be talking about
the notation of an SGML element.
The format in this case is not something which the SGML
parse is aware of.

I must admit I was disappointed to learn that SGML
didn't allow for any way of including 8 bit data. Thanks Eric
<enag@ifi.uio.np> for your explanations.


	MIME and SGML

Dan <connolly@pixel.convex.com> rightly points out
the relevance of the coming MIME standards. There
are several things which we must separate here, though:

   1. The MIME classification of data formats
   2. The MIME format for multi-part messages
   3. The MIME format for rich text.
   4. The MIME formal for external document addresses (MIME UDIs)

1. MIME classification of data formats

	We must do the same disentangling job which JF did
	on HTML to MIME.

	First of all, the MIME job of classifying data formats
	is a useful job which is ideally done by just one
	bunch of people. Ther has been some suggestion that
	the MIME classifications are not well enough defined,
	but they seem to be the best effort yet and one can only
	assume they will eveolve in the right direction. So I'd
	back the use of these for W3.


2. The MIME format for multi-part messages

	This is necessary for sending a multi-part
	document over a mail link.  We have to ask ourselves
	whether it is reasonable to use over a binary link.
	Personally, my initial impression is that the MIME
	stuff, using as it does terminators such as
	--xxx-- separated by blank lines, looks more horrible
	to work with in this respect than SGML! Still we have
	the problem of restrictions on the content:
	Must not contain delimiters, limited 7 bit character set,
	line orientation, in fact all the things which email
	carries as a restriction.  This is really taking on board
	a legacy of all the mail which has evolved over the years.
	Do we need that for our new ultra-fast hypertext access
	protocol?

	[Compare the MIME format with the rather cleaner NeXT
	Mail format which is as far as I understand simply
	a uuencoded compressed tar file of all the bits, where
	uuencoding is designed as an optimal way of getting over
	mail transport restrictions, compress does what it says
	and tar is a multipart wrapper designed for that only. Not
	standard outside unix, perhaps, but cleaner in that the
	mail formatting is done at the last minute and doesn't
	affect the other operations]

	If course, with HTTP2, multipart/alternative shouldn't
	be needed.

  Multipart for hypetext?

	Now, Dan not only suggests the use of this for
	multipart messages, but also suggests that a hypetext
	document shoudl necessarily contain many parts,
	one on SGML and one for each link as a MIME external document.
	This means that an SGML hypertext document can never stand
	on its own! An SGML parser will always need to have
	a MIME parser sitting just outside.  I don't like
	this: I feel we have to separate these two things.

	Suppose that an SGML document does want to
	be sent in a MIME message and does want to
	refer to other parts of that MIME message. In that case,
	it seems reasonable to have a format for that.
	However, when an SGML document is seen by itself, and
	refers to a news message for example, then there is
	no resaon for it not to be able to contain a
	complete reference within itself.

	When SGML documents include other files, then
	the SYSTEM value is typically a file name.
	It is a reeference to something outside. The
	precedent is set that SGML documents are allowed
	to refer to things outside.

	I think part of you objection, Dan is based on 
	a dislike of the UDI syntax -- which I'll come to later.
  
3. The MIME format for rich text.

	Here, I am not so impressed.  Basically, the MIME
	people are at the same level that we were before we started
	this cleanup, that they have SGML-LIKE stuff which isn't SGML.
	As its not difficult to make it SGML, they should do that.
	Comparing MIME's rich text and HTML, I see that
	we lack the characetr formatting attributes BOLD and ITALIC
	but on the other hand I feel that our treatment of
	logical heading levels and other structures is much more powerful
	and has turned out to provide more flexible formatting	
	on different platforms than explicit semi-references
	to font sizes.  This is born out by all the systems which
	use named styles in preference to explicit formatting,
	LaTeX or other macros instead of TeX, etc etc.

	So technically, HTML has some things to give MIME's rich
	text. Are the MIME people still open to additions?
	If not, I would suggest we add BOLD and ITALIC (or
	two emphasis styles for characters), and keep HTML
	separete from MIME's rich text, proposing it as a
	MIME text standard.
	(HP0 and HP1 were in the HTML spec but as unimplemented)
  
4. The MIME format for external document addresses (MIME UDIs)

	As Ed <emv@msen.com> says, this is a bit of a non-issue,
	as MIME addersses and currnet style UDIs map onto
	each other. However, we have to agree on a "concrete
	syntax" (or two... :-) in the end.

	It's like the difference between an x400 style mail address
	generated from an internet address, and that internet address.
	Which do you prefer

		timbl@zippy.lcs.mit.edu

	where the sections of the domain name are defined
	to have no semantics at all, or

		S=timbl; HO=zippy; OU=lcs; O=MIT; SECTOR=edu

	(this is not real x400 - don't use it!) or

		user=timbl
		host=zippy
		group=lcs
		organization=mit
		sector=education

	You say, Dan, that you "don't think [UDIs] work".
	Do you mean people don't use them in all correspondance?
	Well, what DO they use? They use ange-ftp addresses	
	for FTP (like info.cern.ch:/pub/www/doc/*.ps),
	which are even more terse than UDIs! They use news
	message-ids which are UDIs.

	Let me say that I personally don't much care about the
	arbitrary punctuation. There are a few things, though,
	which are important:

	-  The thing should be printable 7-bit ASCII.

	   Unlike arbitrary document formats,
	   UDIs must be sendable in the mail

	- White space should not be significant. I would
	  accept the presence of some arbitrary white space
	  as a delimiter, but one cannot distinguish between
	  different forms and quantities of white space.
	  This is because things get wrapped and unwrapped.

	  Dan, you object to UDIs because they don't
	  contain white space. But that is purely so that
	  to CAN wrap them onto several lines and still
	  recuperate them.  You can put white space
	  in but it shouldn't mean anything. (This is not possible
	  in W3 as is but it is in the UDI document)

	  I don't see why you say they
	  can't be put as an SGML attribute. They are just
	  text strings. They will be quoted of course
	  (Yes, I know the old NeXT browser doesn't quote them)
	  Is that not allowed? What are the problem characters?
	  If there SGML problem characters in the UDI spec, they
	  probably are ruled out of SGML for a reason.

	  (I recently saw in a galley proof of an article in which
	  our mail adress had been hypernated! UDIs must be
          squeezable into 2 inch columns.)

        There is a sematic difference between a tagged
	list and a punctuation-divided set, and that is that
	the former has defined semantics but the latter doesn't and
	can therefore be extended more easily.  I suggest that tagging
	could be used for the four bits of an address
	that must be separable by all sides, which are
	limited in number (4). Within those bits, the string should
	be transparent as the protocol does not require
	every party to understand the innards. 

	The bits are
			MIME		Used by

	name space:	ACCESS		Used by client

	server details:	HOST, PORT	used by client, protocol-dependent
	
	local doc id:	PATH		used by server only

	anchor id: 	(none)		used by presntation application only

	It seems useful to maintain the ability to work out which
	bits are seen by whom.

	I only used punctation to separate these parts in the W3 UDI
	because people like internet addresses and mail addresses
	and filenames and telephone numbers and message-ids and
	room numbers and zip codes which don't have tags and
	do make do with punctuation.  If the groundswell of
	opionion on this list is that tags are better, then
	let's use tags!

	Whatever we sue, it should be as quotable in an SGML
	attribute as in a MIME external reference as in a
	scribbled note or a link-pasteboard or whatever.
	(The U is for Universal, NOT Unique!)

PHILOSOPHY

	In the W3 world, the model is of a dynamic world of
	documents which generally have some "home" or
	(or several), which can be found using sufficient
	intelligence and the help of ones friends given the UDI.

	A mail message has no home, and so in principle the parts
	of it have no home. When a hypertext multipart message
	(really consisting of multiple hypertext documents)
	has links between its parts they refer to each other
	within a completely isolated conetext.

	There are now two possibilites when the message is in fact
	archived and made readable. One is we say that the parts
	are then addressed as parts ofthe message, wherever it
	may be. The other is to say that the parts of the message
	are very likely things which had some original home.
	In that case, the message is just giving the reciever
	a copy to save him the (perhaps insurmountable) trouble
	of retrieving it.  In this case the parts should be
	identified with thier original UDIs so that the
	receiver is not confsed with multiple documents which
	are in fact the same thing. 
	

I think that's all the comments I have on what I've read so far..

	Tim
________________________________________________________________
Tim Berners-Lee
World-Wide Web initiative
CERN, 1211 Geneva 23, Switzerland        timbl@info.cern.ch
Visiting MIT: NE43-513, (617)234 6016    timbl@zippy.lcs.mit.edu














From wathu@lanka.ccit.arizona.edu  Thu Jun 11 19:58:10 1992
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20867; Thu, 11 Jun 92 19:58:10 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA21933; Thu, 11 Jun 92 19:58:18 +0200
Return-Path: wathu@lanka.ccit.arizona.edu
Received: from Lanka.CCIT.Arizona.EDU by Arizona.edu (PMDF #12663) id
 <01GL32SFED9CAL44H9@Arizona.edu>; Thu, 11 Jun 1992 10:36 MST
Received: by lanka.ccit.arizona.edu; Thu, 11 Jun 92 10:37:01 MST
Date: Thu, 11 Jun 92 10:37:01 MST
From: wathu@lanka.ccit.arizona.edu (Wije Wathugala)
Subject: SGML Converters
To: www-talk@nxoc01.cern.ch
Cc: wathu@arizona.edu
Message-Id: <9206111737.AA05287@lanka.ccit.arizona.edu>
X-Envelope-To: www-talk@info.cern.ch

SGML Converters
===============

We are in the process of setting up a WorldWideWeb (WWW) server for computer
center documentation.  Our current documents are in many different 
word processor formats, such as Ventura, WordPerfect (DOS), MS-Word (Mac)
TeX, LaTeX, PostScript, RTF and etc.  We would like to convert them to SGML 
so that we can link them to WWW.

We would like recommendations for good products (commercial or free).
If any of you have tried this type of conversions, please comment on 
your experiences.

Thank you

Wije Wathugala
wathu@arizona.edu

Note: This was also posted to comp.text.sgml


From timbl  Thu Jun 11 20:19:28 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20943; Thu, 11 Jun 92 20:19:28 MET DST
Date: Thu, 11 Jun 92 20:19:28 MET DST
From: timbl (Tim Berners-Lee)
Message-Id: <9206111819.AA20943@ nxoc01.cern.ch >
To: www-talk@nxoc01.cern.ch
Subject: Paper on use of W3, anyone? NSC-92 Pisa

From nir-request@kona.cc.mcgill.ca Mon Jun  8 14:03:43 1992
Return-Path: <nir-request@kona.cc.mcgill.ca>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA10963; Mon, 8 Jun 92 14:03:39 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA14481; Mon, 8 Jun 92 14:01:45 +0200
Received: by kona.cc.mcgill.ca (5.65a/IDA-1.4.2b/CC-Guru-2b)
        id AA00594  on Mon, 8 Jun 92 07:42:58 -0400
Received: from relay.cdnnet.ca by kona.cc.mcgill.ca with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b)
        id AA00590  (mail destined for /usr/lib/sendmail -odq -oi -fnir-request nir-out) on Mon, 8 Jun 92 07:42:50 -0400
From: Jill.Foster@newcastle.ac.uk
Received: by relay.CDNnet.CA (4.1/1.14)
	id AA11925; Mon, 8 Jun 92 04:42:40 PDT
Date:  7 Jun 92 17:24 +0100
To: nir@cc.mcgill.ca
Reply-To: Jill.Foster@newcastle.ac.uk
Message-Id: <emu-ov07.1992.0607.172441.cl54(a)uk.ac.ncl.mts>
Subject: PISA conference: paper deadline extended
Status: RO

 This is being posted to several lists.
 
 The deadline for papers for the Network Services Conference has been
 extended to June 21st.  I'm particularly keen to see papers on
 information services and perhaps someone could do a "consumer report"
 on some of the applications such as WWW, WAIS, Gopher archie etc.  -
 with particular emphasis on the information management tools (for the
 information providers).
 
 I'm also interested in papers for the user support sections, and would
 like to see case studies presented in building "electronic
 communities".
 
 I'll forward any papers sent to me - or you can forward them to the
 conf.  secretariat as indicated below.
 
 Thanks,
 
 Jill Foster
--------------------------- cut here --------------------------------
 
              First Call for Participation / Call for Papers
                                  NSC'92
                  The Network Services Conference 1992
                     Pisa, Italy, November 3-5, 1992
 
 
                Network Services Conference 1992
 
                            Overview
 
 
 The world of  academic and research networking has evolved  to the point
 where  the  protocol  wars  have  become  largely  irrelevant.  This  is
 demonstrated  by the  recent appearance  of high-level  networking tools
 which  are worldwide  in scope  and which  run simultaneously  over many
 different lower layers.
 
 NSC 92  will focus on  issues in  providing services to  customers, with
 special attention  paid to the  recent and exciting developments  in new
 global high-level tools such as  World-Wide Web, Prospero, Archie, Alex,
 Gopher, and WAIS. We will address the  impact of the new global tools on
 service development and  support, the  changing function  of traditional
 tools and services  (such as archives), upcoming  specific services such
 as new  databases, and the future  role of the library.  User support at
 the campus level, and the role  of support in accessing global services,
 will be addressed.
 
 The conference will be of greatest interest to network service providers
 and sophisticated users  who are changing their focus  from providing or
 obtaining  bandwidth  to  offering,  supporting, and  using  varied  and
 powerful services.  Talks and  other conference activities  will address
 the  needs   of  the  research,  academic,   educational,  governmental,
 industrial, and commercial network communities.
 
 NSC 92  is being  organized by EARN  in conjunction  with EUnet/EurOpen,
 NORDUnet, RARE, and RIPE.
 
     Conference Venue
 
 Pisa is situated in Tuscany on the Arno river. The Italian poet Gabriele
 D'ANNUNZIO named  Pisa's Piazza della  Torre: "The Square  of Miracles",
 and yet the definition could be extended with equal justice to the whole
 city. Pisa is not  only an art center with few rivals;  it is steeped in
 culture  and  science  and  offers  an  up-to-date  infrastructure.  The
 conference will  be held  at the  Palazzo dei  Congressi, near  the city
 center and at walking distance to the Hotels.
 
     Program and Registration Information
 
 The conference program with information on how to register will be
 distributed with the second announcement around 1 August 1992.
 
     Papers
 
 Papers for presentation at the conference are solicited in the following
 areas:
 
 -  Dealing with the Information Explosion
    . New Global Information Access Tools
    . Utilizing Established Information Access and Distribution Tools
 
 -  Managing Global Network Information Services
    . Coordination/Duplication, Security, Privacy, Authentication
    . Closed Group Applications
 
 -  The Electronic Library
    . Local Databases, Remote Databases, OPACS, CD/ROMS
    . Inter-Library Cooperation
 
 -  User / Customer Support
    . Help Desks, Documentation, Reaching the Customers
 
 -  Assessing Customer Needs
 
 -  Special Interest Communities
 
 -  Group Communication Technologies & Services - "Groupware"
 
 -  Networking for Schools
 
 -  Delivering Messaging to the Desktop
    . Practical Experiences, Products, Security, Interface issues
 
 -  Beyond ASCII
    . Character sets, Multimedia
    . Creating, Encoding, Receiving
 
 -  Economic Aspects of Networking
    . Bandwidth, E-mail Access, Efficiency, Control
 
 -  Recent European Networking Developments
 
 Please submit  title and abstract,  by mail, fax or  PREFERRABLY e-mail,
 not later than 31 May 1992 to:
 
   Hans Deckers (DECK@FRORS12.BITNET)
   EARN Office
   c/o CIRCE
   BP 167
   F91403 Orsay France
   Tel: +33 1 6982 3973
   Fax: +33 1 6928 5273
 
     Posters and Demonstrations
 
  A poster  wall will  be available  to participants  for the  display of
  their posters and  projects. A terminal room with  connectivity to EARN
  and the Internet will be available to delegates.
 
  A  room will  be available  for  workstations and  PCs to  be used  for
  demonstrations. An Ethernet connected to the Internet will be available
  in the room. Connectivity to the Internet  will be via a 64Kbps line to
  CNUCE. The minimum bandwidth between  CNUCE and CERN is 512Kbps. People
  interested in  setting up  demonstrations may  send their  questions to
  NSCINFO@FRORS12.BITNET
 
     Further Information and General Inquiry
 
  Further  information will  be available  through an  ad hoc  conference
  mailing list.  If you want to  make sure you receive  the invitation as
  well  as the  preliminary program  please ask  for subscription  to the
  conference mailing  list ( NSC92@FRORS12.BITNET )  sending mail, e-mail
  or fax specifying your e-mail address to:
 
   Nadine Grange  (GRANGE@FRORS12.BITNET)
   EARN Office
   c/o CIRCE
   BP 167
   F91403 Orsay France
   Tel: +33 1 6982 3973
   Fax: +33 1 6928 5273
 
  General inquiries can be made at NSCINFO@FRORS12.
 
 
     Program and Organizing Committee
 
     Program Committee
 
    Dennis  Jennings, Ireland  (Chair);  Rob  Blokzijl, the  Netherlands;
    Daniele  Bovio,  France;  Paul  Bryant, United  Kingdom;  Avi  Cohen,
    Israel;  Laszlo  Csaba,  Hungary;  Hans  Deckers,  France;  Jean-Loic
    Delhaye, France; Jill Foster, United Kingdom; Frode Greisen, Denmark;
    Glenn Kowack,  the Netherlands; Stelios Orphanoudakis,  Greece; David
    Sitman, Israel; Stefano Trumpy, Italy
 
     Organizing Committee
 
    Frode  Greisen,  Denmark  (Chair  );  Hans  Deckers,  France;  Dennis
    Jennings,  Ireland; Glenn  Kowack,  the  Netherlands; Marco  Sommani,
    Italy
 
 
     Corporate Sponsors (Preliminary list )
 
    IBM


From connolly@pixel.convex.com  Fri Jun 12 03:31:28 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA21811; Fri, 12 Jun 92 03:31:28 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA14764; Fri, 12 Jun 92 03:31:33 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA12883; Thu, 11 Jun 92 20:31:12 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA29502; Thu, 11 Jun 92 20:31:09 -0500
Message-Id: <9206120131.AA29502@pixel.convex.com>
To: timbl@zippy.lcs.mit.edu (Tim Berners-Lee)
Cc: enag@ifi.uio.no, www-talk@nxoc01.cern.ch
Subject: Re: MIME, SGML, UDIs, HTML and W3 
In-Reply-To: Your message of "Thu, 11 Jun 92 12:22:56 EDT."
             <9206111622.AA03819@zippy.lcs.mit.edu> 
Date: Thu, 11 Jun 92 20:31:08 CDT
From: Dan Connolly <connolly@pixel.convex.com>


Now my comments on your comments:

>There is no reason why we shouldn't try both protocols.
>If they map well onto each other, its just a question
>of having two separate prasers at the low level, building
>the same internal structures.
>
On the other hand, I'd like to keep a telnet based protocol
around -- maybe gopher is good enough.

>When we're talking about an SGML representation,
>and describe a file to come later down the link,
>I don't think we have to use the NOTATION= attribute with a notation
>type, because we won't in fact be talking about
>the notation of an SGML element.
>The format in this case is not something which the SGML
>parse is aware of.
>
I don't believe this is true. From the horse's mount (Erik Naggum, that is):
----
|   What's the scoop? Do we have to use external entities for raw data?

Yes.  An external entity that is not an SGML text entity requires a
notation identifier, so you only need to list the entities in the DTD,
with notation, and refer to them by name in the document instance.

----

>1. MIME classification of data formats
>
>	 So I'd
>	back the use of these for W3.
>
Yeah!!

>
>2. The MIME format for multi-part messages
>
>	This is necessary for sending a multi-part
>	document over a mail link.  We have to ask ourselves
>	whether it is reasonable to use over a binary link.
>	Personally, my initial impression is that the MIME
>	stuff, using as it does terminators such as
>	--xxx-- separated by blank lines, looks more horrible
>	to work with in this respect than SGML!

The algorithm to separate a MIME multipart message into its
parts is simply: search the data stream for CRLF--boundary--CRLF.
It can be done by a finite state machine. Even the simplest
SGML documents require a pushdown automaton to parse.

> Still we have
>	the problem of restrictions on the content:
>	Must not contain delimiters, limited 7 bit character set,
>	line orientation, in fact all the things which email
>	carries as a restriction.  This is really taking on board
>	a legacy of all the mail which has evolved over the years.
>	Do we need that for our new ultra-fast hypertext access
>	protocol?
>

No, we don't. MIME _allows_ transfer of data over 7 bit ASCII
channels, but it hardly requres it. The Content-transfer-encoding
can be:
	7 bit (default): line oriented 7 bit data
	8 bit : line oriented 8 bit data
	binary : raw 8 bit data, no CRLF's required
	base64: uuencode standardized
	quoted-pritable: text with escape sequences

The MIME standard explicitly supports expansion to 8 bit transport
mechanisms.

>	[Compare the MIME format with the rather cleaner NeXT
>	Mail format which is as far as I understand simply
>	a uuencoded compressed tar file of all the bits, where
>	uuencoding is designed as an optimal way of getting over
>	mail transport restrictions, compress does what it says
>	and tar is a multipart wrapper designed for that only. Not
>	standard outside unix, perhaps, but cleaner in that the
>	mail formatting is done at the last minute and doesn't
>	affect the other operations]
>
It was a requirement of MIME that the structure of the document
be accessible without decoding or uncompressing data, especially
since MIME messages are recursive and complex messages might
otherwise go through more than one encoding.

Compression was not addressed by the MIME standard, and uuencode
doesn't make it though some gateways.

>	If course, with HTTP2, multipart/alternative shouldn't
>	be needed.
>
What does HTTP2 define that obviates the multipart/alternative
type?


>  Multipart for hypetext?
>
>	Now, Dan not only suggests the use of this for
>	multipart messages, but also suggests that a hypetext
>	document shoudl necessarily contain many parts,
>	one on SGML and one for each link as a MIME external document.
>	This means that an SGML hypertext document can never stand
>	on its own!

That's exatly the point. Anything besides text should be handled
as an external entity to be resolved by the parsing system. I just
suggested that a portable way to resolve SGML external entities
is to refer to MIME attachments.

> An SGML parser will always need to have
>	a MIME parser sitting just outside.  I don't like
>	this: I feel we have to separate these two things.
>
Well, it has to have something sitting outside. The SGML parsers
I've seen resolve system entities using the file system. I proposed
we use a MIME message like a mini file system, with links to
other file systems.

>	Suppose that an SGML document does want to
>	be sent in a MIME message and does want to
>	refer to other parts of that MIME message. In that case,
>	it seems reasonable to have a format for that.
>	However, when an SGML document is seen by itself, and
>	refers to a news message for example, then there is
>	no resaon for it not to be able to contain a
>	complete reference within itself.
>
OK, I can see that we should be able to resolve the lexical
issues and put the whole UDI/MIME access specification inside
the SGML document.

But what about multimedia web nodes?

SGML describes text and references to other texts just fine.
But if we want a format that can include more than just
text, I don't think we should try to fit it _inside_ SGML.

I think SGML should be used to convey text and document
structure. But I still like the idea of wrapping it in
a MIME message for multimedia interoperability.


>3. The MIME format for rich text.
>
>	Here, I am not so impressed.
Nor am I.


>4. The MIME format for external document addresses (MIME UDIs)
>
>	As Ed <emv@msen.com> says, this is a bit of a non-issue,
>	as MIME addersses and currnet style UDIs map onto
>	each other. However, we have to agree on a "concrete
>	syntax" (or two... :-) in the end.
>
Exactly. And why not the MIME concrete syntax?

>	Let me say that I personally don't much care about the
>	arbitrary punctuation. There are a few things, though,
>	which are important:
>
>	-  The thing should be printable 7-bit ASCII.
>
MIME: check.

>	   Unlike arbitrary document formats,
>	   UDIs must be sendable in the mail
>
MIME: check.

>	- White space should not be significant. I would
>	  accept the presence of some arbitrary white space
>	  as a delimiter, but one cannot distinguish between
>	  different forms and quantities of white space.
>	  This is because things get wrapped and unwrapped.
>
MIME: check.

>	  Dan, you object to UDIs because they don't
>	  contain white space. But that is purely so that
>	  to CAN wrap them onto several lines and still
>	  recuperate them.  You can put white space
>	  in but it shouldn't mean anything. (This is not possible
>	  in W3 as is but it is in the UDI document)
>
I must not have read the UDI document closely. I certainly
got the impression that a UDI should look like one word
when "written on the back of an envelope."

>	  I don't see why you say they
>	  can't be put as an SGML attribute. They are just
>	  text strings.

The WAIS UDIs are huge. An SGML declaration defines a maximum
for the length of an attribute value. The default value is ...
oh. ahem. it's 960. I think the MIME 72 character line length
is a little more restrictive than that :-)

> They will be quoted of course
>	  (Yes, I know the old NeXT browser doesn't quote them)
>	  Is that not allowed? What are the problem characters?
>	  If there SGML problem characters in the UDI spec, they
>	  probably are ruled out of SGML for a reason.
>
Good question. These are the things we should research before
we go _any_ further implementing this stuff.

>	Whatever we sue, it should be as quotable in an SGML
>	attribute as in a MIME external reference as in a
>	scribbled note or a link-pasteboard or whatever.
>	(The U is for Universal, NOT Unique!)
>
Here's an idea for a quoting strategy for the four parts: Either
	a) it'a a quoted string delimited by "" with \" allowed
	in the middle, or
	b) it's a base-64 representation of an arbitrary
	binary stream.
Just an idea.

I'm late for an appointment. Gotta go.

Dan

From davis@willow.tc.cornell.edu  Fri Jun 12 13:51:03 1992
Return-Path: <davis@willow.tc.cornell.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA22825; Fri, 12 Jun 92 13:51:03 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA06860; Fri, 12 Jun 92 13:51:11 +0200
Received: by willow.tc.cornell.edu (4.1/SMI-4.1)
	id AA27811; Fri, 12 Jun 92 07:52:59 EDT
Date: Fri, 12 Jun 92 07:52:59 EDT
From: davis@willow.tc.cornell.edu (Jim Davis)
Message-Id: <9206121152.AA27811@willow.tc.cornell.edu>
To: www-talk@nxoc01.cern.ch
Subject: file: access does not use RCP?

It would be a good idea if WWW tried to use RCP for
file access in addition to using anonymous FTP.

(I think it does not use RCP because I made a link to a file
on another computer and WWW was not able to access it.  But
I was able to RCP the file.  I set up a link to a different
file to which I verified anonymous FTP access, and that worked
fine.)

I would guess that RCP copying would be faster than FTP, so
maybe it should be tried first.

Best wishes... Viola and WWW are excellent.


From timbl  Fri Jun 12 18:11:08 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA23957; Fri, 12 Jun 92 18:11:08 MET DST
Date: Fri, 12 Jun 92 18:11:08 MET DST
From: timbl (Tim Berners-Lee)
Message-Id: <9206121611.AA23957@ nxoc01.cern.ch >
To: davis@willow.tc.cornell.edu, www-talk@nxoc01.cern.ch
Subject: Re: file: access does not use RCP?

Good idea -- I guess RCP would be faster, though I don't
know the protocol. The client code would have to keep a
list of hosts for which it had succeded with one or other method,
to save time always trying rcp first on distant sites.

Something for the agenda...
Where is the rcp protocol defined?

Tim BL

From peterson@choctaw.csc.ti.com  Fri Jun 19 20:30:42 1992
Return-Path: <peterson@choctaw.csc.ti.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA16044; Fri, 19 Jun 92 20:30:42 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA05105; Fri, 19 Jun 92 20:30:59 +0200
Received: from tilde.csc.ti.com ([128.247.160.56]) by ti.com with SMTP 
	(5.59/LAI-3.2) id AA26952; Fri, 19 Jun 92 13:30:22 CDT
Received: from choctaw.csc.ti.com (choctaw) by tilde.csc.ti.com id AA29287; Fri, 19 Jun 1992 13:29:16 -0500
Received: by choctaw.csc.ti.com (4.1/SMI-4.1)
	id AA23472; Fri, 19 Jun 92 13:29:15 CDT
From: peterson@choctaw.csc.ti.com (Bob Peterson)
Message-Id: <9206191829.AA23472@choctaw.csc.ti.com>
Subject: Re: MIME, SGML, UDIs, HTML and W3
To: timbl@zippy.lcs.mit.edu (Tim Berners-Lee)
Date: Fri, 19 Jun 92 13:29:14 CDT
Cc: connolly@pixel.convex.com, enag@ifi.uio.no, www-talk@nxoc01.cern.ch,
        timbl@zippy.lcs.mit.edu
In-Reply-To: <9206111622.AA03819@zippy.lcs.mit.edu>; from "Tim Berners-Lee" at Jun 11, 92 12:22 pm
Reply-To: peterson@choctaw.csc.ti.com
Mime-Version: 1.0
X-Mailer: ELM [version 2.3 PL11]

You said:
> 
> 	...
> PHILOSOPHY
> 
> 	In the W3 world, the model is of a dynamic world of
> 	documents which generally have some "home" or
> 	(or several), which can be found using sufficient
> 	intelligence and the help of ones friends given the UDI.

  My group has thought about the identity issue in the context of
object identifiers in a distributed object-oriented database.

> 	A mail message has no home, and so in principle the parts
> 	of it have no home. When a hypertext multipart message
> 	(really consisting of multiple hypertext documents)
> 	has links between its parts they refer to each other
> 	within a completely isolated conetext.

  In the OODB we think of an address (UDI or object identifier) as
relative to some enclosing context.  Different parts of an address make
sense only in the correct context.  For example, the mail system
accesses several address contexts to resolve a mail address such as
peterson@csc.ti.com: .com, ti.com, csc.ti.com, and the email address
namespace.  Each context understands its part and returns a reference
to the next, usually more specific, context.  The program(s) attempting
to resolve the address understand the result of an address lookup, and
use each result appropriately.

  I claim a UDI makes sense only in a particular context.  If a UDI
makes explicit all contexts except the most global, then a UDI easily
refers to a different part of the same multipart message.

> 	There are now two possibilites when the message is in fact
> 	archived and made readable. One is we say that the parts
> 	are then addressed as parts of the message, wherever it
> 	may be.

  This might enable operating on a message when the "home" is the
process' address space, i.e., before the message is placed into a file
system or other addressing context.  In effect the context is the
machine and the process' address space, but these can be, and generally
are, defaulted or assumed rather than explicitly stated.

>	         The other is to say that the parts of the message
> 	are very likely things which had some original home.
> 	In that case, the message is just giving the reciever
> 	a copy to save him the (perhaps insurmountable) trouble
> 	of retrieving it.  In this case the parts should be
> 	identified with thier original UDIs so that the
> 	receiver is not confsed with multiple documents which
> 	are in fact the same thing. 

  I wonder about attaching two UDI's to a message: a (required)
absolute UDI, referring to the original home, and a second (optional)
UDI referring to a "less expensive" copy.  ("Less expensive" is, of
course, arbitrarly defined.)  Think of the latter as a hint, i.e., if
the user attempts to resolve the UDI the system first looks for the
hint and, if found, uses it.  If the hint is absent or fails, then the
system tries to use the (more expensive) required UDI.

  Of course thinking about this might be simpler if we refer to one UDI
with two parts: one required, the other optional.

  Benefits of this approach include retaining the reference to the
original site while, at the same time, supporting replication of the
document in an arbitrary number of locations.  If the optional UDI is
relative to the containing message then (1) the reference never fails,
and (2) performance is excellent.  Retaining the original UDI should
help some applications monitor the original for revisions, e.g., an
archive site could cache a document but check periodically with the
original site for an updated version.  Retaining the original can also
help resolve the validity of a document, e.g., by enabling comparison
of the original and cached copies.

  One could implement the optional UDI as a table external to the
document.  When dereferencing a UDI the table is checked first and, if
the UDI is found, the associated optional UDI is used.  This has the
advantage of not modifying the original document, including not
changing the result of any error detection arithmetic, e.g., checksums.

> 
> I think that's all the comments I have on what I've read so far..
> 
> 	Tim

    Bob

-- 
Bob Peterson             Work: peterson@csc.ti.com              Expressway Site
Texas Instruments        Home: peterson@zgnews.lonestar.org     North Building
P.O. Box 655474, MS238   TIMSG: RWP  Landline: +1 214 995 6080  Aisle A4
Dallas, Tx USA 75265                 FAX line: +1 214 995 0304  2-88V97

From raisch@cthulhu.control.com  Mon Jun 22 19:27:14 1992
Return-Path: <raisch@cthulhu.control.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20886; Mon, 22 Jun 92 19:27:14 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA14855; Mon, 22 Jun 92 19:27:25 +0200
Received: by control.com (4.1/Spike-2.0)
	id AA04177; Mon, 22 Jun 92 13:25:37 EDT
From: raisch@cthulhu.control.com (Robert Raisch)
Message-Id: <9206221725.AA04177@control.com>
Subject: Links, Type and Documents (Third time's a charm)
To: www-talk@nxoc01.cern.ch
Date: Mon, 22 Jun 92 13:25:35 EDT
Cc: One@cthulhu.control.com, of@cthulhu.control.com, the@cthulhu.control.com,
        missing@cthulhu.control.com, pieces@cthulhu.control.com,
        here@cthulhu.control.com, is@cthulhu.control.com,
        the@cthulhu.control.com, ability@cthulhu.control.com,
        of@cthulhu.control.com, creating@cthulhu.control.com,
        new@cthulhu.control.com, h-texts@cthulhu.control.com,
        and@cthulhu.control.com
Action: dding new links to old h-texts.
X-Mailer: ELM [version 2.3 PL11]

[ This is the third time I have tried to post this.  Drop me a line,
  Francois, if you see this on the list.  Thanks.  </rr> ]

<!-- 

  I first posted this some weeks ago on the 'www-interest' list, and 
  received only one reply, (complementing me on my reference to Rexx.)

  I really had hoped that this post would start an interesting discussion
  on the topics I address, specifically the ideas of 'attention links' 
  'user documents' and 'transparent documents'.

  Are these ideas so obvious that they merit no discussion whatsoever?

  Always interested in replies, </rr>

 -->

<Preface>
First of all, hearty congrats to the WWW people.  It's a great tool, and
since it is based on SGML, it has the broadest scope of any solution I have
yet seen.

To others, see the current issue of Byte magazine regarding "Info-Glut" and
SGML.  Interesting.


<Query>
One of the missing pieces here is the ability of creating new h-texts, and
adding new links to old h-texts.

Hypertext, and like systems, are of limited use if they do not support
collaboration.  I feel that this is a VERY important point.

When might we expect extensions to WWW that support collaboration?
</Query>
</Preface>

I have a few recommendations regarding new link types in WWW.  This is based
on thinking about hyper-applications for almost 15 years, (ever since I 
first had the pleasure of hearing Ted Nelson speak in 1977.)

Please keep in mind that these are 'front end' issues.  They should not
affect the manner in which documents are stored.

------------------

There are 4 'minimal' link types which, I believe, a true hypertext application
*must* support.

	1.	Replacement
			-- when activated, replaces the current document
			   with a new document, (this is what WWW offers
			   today).

	2.	Annotation
			-- when activated, overlays a new document on the
			   current document, partially obscuring the original.
			   (An annotation must be dismissed by the reader.)

	3.	Inclusion
			-- when the document is created, elements from other
			   documents are collection to be included in the
			   representation of the current document.  (Quotes)
			   (This is a non-interactive link.  The user does
			    not activate this link. It is activated before 
			    the document is presented to the user.)

	4.	Expansion
			-- when activated, new information is added to the 
			   current document, expanding the original scope.

			   (Think of outline processors, and the collapse
			    of detail.)  

			   This is also a reflection of Nelson's concept of 
			   'stretch text'.  

			   A stretch-text definition of "stretch text" might be:

{Collapsed}		   "Stretch text is{}a sentence{}that when{}collapsed
			    states it's thesis{}and when expanded adds detail." 

{Expanded}		   "Stretch text is where a sentence is constructed in 
			    such a way that when it is collapsed it states it's 
			    thesis in simple terms, and when expanded adds 
			    detail to further express itself."

There are 3 further types which I believe are necessary to complete the
function paradigm.  (Of particular interest is the 'attention link'.)


	6.	Execution

			-- when activated, some arbitrary function is performed.
			   The point that was mentioned about the lack of an
			   ubiquitious scripting language is well made.  Lisp
			   is too arcane for most.  Shell languages are too
			   platform specific.  What is needed is a simple
			   to understand, freely available scripting platform.
			   Although I hesitate to mention it, REXX might be
			   a reasonable choice due to it's broad availability.

	5.	Attention   (a specialisation of the Execution type)

			-- when the current document is modified (a link is
			   added, or removed, or the document is merely read)
			   a message is sent to the 'owner' of the attention
			   link.  This message creates a new link in the 'user
			   document' of the individual who placed the attention.
			   (See definition of 'user document' below.)

			   In this way, I could place a link onto a document I 
			   had interest in, and when it was changed or accessed
			   in some manner, I would be informed.

	7.	Collection  (a non-local specialisation of the Execution type)

			-- when activated, a collection link leaves the current
			   document, and 'travels' the docuverse, in search of
			   other documents which satisfy it's internal criteria.
			   This is the concept of a 'knowbot'.

			   Collection links can be activated based on day and
			   time, much like the WAIS questions in the MAC 
			   WAIS interface, WAIS-Station.  They could also be
			   activated based on external events, such as the 
			   activation of an attention link.

			   Collection links would be written in the ubiquitious
			   scripting language, and would only be allowed to 
			   operate on documents which were EXPLICITLY permitted.


Along with the various links presented, two new varieties of document would
be used.

	Transparent Documents  --

		a transparent document is one which a user creates locally,
		and that is a new representation of an existing document.

		Transparent documents are used to create new local links on
		a document which I do not have permission to modify.

		Transparent documents can then be made available to others,
		(published) just as a "regular" document is, thus facilitating
		the creation of new works from old.

	User Documents --

		a user document is where I keep my "bookmarks", links to
		local documents, links to messages from others, links to
		my "attention" links, (see below).  User documents are where
		we, as navigators of the docuverse, are defined as individuals.

		They are also where we can keep links to other user documents
		which have been permitted to view/modify my own local documents.

		Another function of the User document is to collect users into
		an abstract group. (Thus, based on my membership in user 
		document 'Research Group', I am permitted access to materials
		'owned' by that group. Of course, messages sent to an abstract
		group then become available to all members of that group.)

		(Please note that a User Document is nothing more or less than
		 a collection of links, (as all documents are).)

----------------------------

So.....

	Scenerio:

		I start my session with my hypertext-application, and open 
		my user document.

		I notice that 17 of my attention links have been activated 
		in the last day.

		I select the most interesting and activate the link which
		it created in my personal user document.

		I am now reading an article which I previously linked, and
		see that an annotation which I made some time ago has been
		added to, by a colleague.

		The comments are pertinant to my current work, so I create
		a new local 'transparent' document to mirror the original 
		work.  (Or use the 'transparent' document I may have created
		previously.)

		On this new document, I make a few new annotations and decide
		to made this new work available to the research group of which
		I am leader.  I place a link to it in the user document which
		represents my working group.

		I also send a new document link to the colleague who made the
		original comments, so that he can see how I have interpreted 
		his ideas, and included them into my own research.

		I move ever onwards...

---------------------

Ok, I hope that that fuels a little discussion, and I would *love* to hear 
from others regarding these ideas.  

Regards, </rr>

"knowledge is the *only* weapon"
-- 

From raisch@cthulhu.control.com  Mon Jun 22 19:32:37 1992
Return-Path: <raisch@cthulhu.control.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA20902; Mon, 22 Jun 92 19:32:37 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA15255; Mon, 22 Jun 92 19:32:51 +0200
Received: by control.com (4.1/Spike-2.0)
	id AA04308; Mon, 22 Jun 92 13:30:55 EDT
From: raisch@cthulhu.control.com (Robert Raisch)
Message-Id: <9206221730.AA04308@control.com>
Subject: Links, Types and Documents (Third time's a charm)
To: jfg@dxcern.cern.ch
Date: Mon, 22 Jun 92 13:30:55 EDT
Cc: www-talk@nxoc01.cern.ch
X-Mailer: ELM [version 2.3 PL11]

[ 
  ARRRGGGGHHHHHH!!!! I goofed.  This is the full message, please 
  disregard the previous one, since I am not sure that it got out
  properly.

  This is the third time I have tried to post this.  Drop me a line,
  Jean Francois, if you see this on the list.  Thanks.  </rr> ]

<!-- 

  I first posted this some weeks ago on the 'www-interest' list, and 
  received only one reply, (complementing me on my reference to Rexx.)

  I really had hoped that this post would start an interesting discussion
  on the topics I address, specifically the ideas of 'attention links' 
  'user documents' and 'transparent documents'.

  Are these ideas so obvious that they merit no discussion whatsoever?

  Always interested in replies, </rr>

 -->

<Preface>
First of all, hearty congrats to the WWW people.  It's a great tool, and
since it is based on SGML, it has the broadest scope of any solution I have
yet seen.

To others, see the current issue of Byte magazine regarding "Info-Glut" and
SGML.  Interesting.


<Query>
One of the missing pieces here is the ability of creating new h-texts, and
adding new links to old h-texts.

Hypertext, and like systems, are of limited use if they do not support
collaboration.  I feel that this is a VERY important point.

When might we expect extensions to WWW that support collaboration?
</Query>
</Preface>

I have a few recommendations regarding new link types in WWW.  This is based
on thinking about hyper-applications for almost 15 years, (ever since I 
first had the pleasure of hearing Ted Nelson speak in 1977.)

Please keep in mind that these are 'front end' issues.  They should not
affect the manner in which documents are stored.

------------------

There are 4 'minimal' link types which, I believe, a true hypertext application
*must* support.

	1.	Replacement
			-- when activated, replaces the current document
			   with a new document, (this is what WWW offers
			   today).

	2.	Annotation
			-- when activated, overlays a new document on the
			   current document, partially obscuring the original.
			   (An annotation must be dismissed by the reader.)

	3.	Inclusion
			-- when the document is created, elements from other
			   documents are collection to be included in the
			   representation of the current document.  (Quotes)
			   (This is a non-interactive link.  The user does
			    not activate this link. It is activated before 
			    the document is presented to the user.)

	4.	Expansion
			-- when activated, new information is added to the 
			   current document, expanding the original scope.

			   (Think of outline processors, and the collapse
			    of detail.)  

			   This is also a reflection of Nelson's concept of 
			   'stretch text'.  

			   A stretch-text definition of "stretch text" might be:

{Collapsed}		   "Stretch text is{}a sentence{}that when{}collapsed
			    states it's thesis{}and when expanded adds detail." 

{Expanded}		   "Stretch text is where a sentence is constructed in 
			    such a way that when it is collapsed it states it's 
			    thesis in simple terms, and when expanded adds 
			    detail to further express itself."

There are 3 further types which I believe are necessary to complete the
function paradigm.  (Of particular interest is the 'attention link'.)


	6.	Execution

			-- when activated, some arbitrary function is performed.
			   The point that was mentioned about the lack of an
			   ubiquitious scripting language is well made.  Lisp
			   is too arcane for most.  Shell languages are too
			   platform specific.  What is needed is a simple
			   to understand, freely available scripting platform.
			   Although I hesitate to mention it, REXX might be
			   a reasonable choice due to it's broad availability.

	5.	Attention   (a specialisation of the Execution type)

			-- when the current document is modified (a link is
			   added, or removed, or the document is merely read)
			   a message is sent to the 'owner' of the attention
			   link.  This message creates a new link in the 'user
			   document' of the individual who placed the attention.
			   (See definition of 'user document' below.)

			   In this way, I could place a link onto a document I 
			   had interest in, and when it was changed or accessed
			   in some manner, I would be informed.

	7.	Collection  (a non-local specialisation of the Execution type)

			-- when activated, a collection link leaves the current
			   document, and 'travels' the docuverse, in search of
			   other documents which satisfy it's internal criteria.
			   This is the concept of a 'knowbot'.

			   Collection links can be activated based on day and
			   time, much like the WAIS questions in the MAC 
			   WAIS interface, WAIS-Station.  They could also be
			   activated based on external events, such as the 
			   activation of an attention link.

			   Collection links would be written in the ubiquitious
			   scripting language, and would only be allowed to 
			   operate on documents which were EXPLICITLY permitted.


Along with the various links presented, two new varieties of document would
be used.

	Transparent Documents  --

		a transparent document is one which a user creates locally,
		and that is a new representation of an existing document.

		Transparent documents are used to create new local links on
		a document which I do not have permission to modify.

		Transparent documents can then be made available to others,
		(published) just as a "regular" document is, thus facilitating
		the creation of new works from old.

	User Documents --

		a user document is where I keep my "bookmarks", links to
		local documents, links to messages from others, links to
		my "attention" links, (see below).  User documents are where
		we, as navigators of the docuverse, are defined as individuals.

		They are also where we can keep links to other user documents
		which have been permitted to view/modify my own local documents.

		Another function of the User document is to collect users into
		an abstract group. (Thus, based on my membership in user 
		document 'Research Group', I am permitted access to materials
		'owned' by that group. Of course, messages sent to an abstract
		group then become available to all members of that group.)

		(Please note that a User Document is nothing more or less than
		 a collection of links, (as all documents are).)

----------------------------

So.....

	Scenerio:

		I start my session with my hypertext-application, and open 
		my user document.

		I notice that 17 of my attention links have been activated 
		in the last day.

		I select the most interesting and activate the link which
		it created in my personal user document.

		I am now reading an article which I previously linked, and
		see that an annotation which I made some time ago has been
		added to, by a colleague.

		The comments are pertinant to my current work, so I create
		a new local 'transparent' document to mirror the original 
		work.  (Or use the 'transparent' document I may have created
		previously.)

		On this new document, I make a few new annotations and decide
		to made this new work available to the research group of which
		I am leader.  I place a link to it in the user document which
		represents my working group.

		I also send a new document link to the colleague who made the
		original comments, so that he can see how I have interpreted 
		his ideas, and included them into my own research.

		I move ever onwards...

---------------------

Ok, I hope that that fuels a little discussion, and I would *love* to hear 
from others regarding these ideas.  

Regards, </rr>

"knowledge is the *only* weapon"
-- 

From M.Hu@cs.ucl.ac.uk  Mon Jun 22 20:56:31 1992
Return-Path: <M.Hu@cs.ucl.ac.uk>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA21169; Mon, 22 Jun 92 20:56:31 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA18429; Mon, 22 Jun 92 20:56:54 +0200
Message-Id: <9206221856.AA18429@dxmint.cern.ch>
Received: from tooting.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP 
          id <g.19271-0@bells.cs.ucl.ac.uk>; Mon, 22 Jun 1992 19:53:34 +0100
To: raisch@cthulhu.control.com
Cc: jfg@dxcern.cern.ch, www-talk@nxoc01.cern.ch, M.Hu@cs.ucl.ac.uk
Subject: Re: Links, Types and Documents (Third time's a charm)
In-Reply-To: Your message of Mon, 22 Jun 92 13:30:55 -0500. <9206221730.AA04308@control.com>
Date: Mon, 22 Jun 92 19:55:27 +0100
From: M.Hu@cs.ucl.ac.uk


On Mon, 22 Jun 92 13:30:55 EDT ,
raisch@COM.CONTROL.CTHULHU  wrote:

> 
><Query>
>One of the missing pieces here is the ability of creating new h-texts, and
>adding new links to old h-texts.

I agree with you that it is important for hypertext to create automatically
new h-texts and add new links to old ones. In my point of view, the first
step to realize it is to divide links from the texts. In WWW model, links
are inside the hypertexts, which makes it very difficult.


> 
>Hypertext, and like systems, are of limited use if they do not support
>collaboration.  I feel that this is a VERY important point.


Talking about collaboration of different hypertexts (IT IS VERY IMPORTANT.),
I feel it is more complicating. Interchange of formats of hypertexts and links
among different systems needs a better-structured hypertext model as a core.
WWW can not do it, and furthermore, it is not a good model to do it...

> 
>When might we expect extensions to WWW that support collaboration?
></Query>
></Preface>
> 
>I have a few recommendations regarding new link types in WWW.  This is based
>on thinking about hyper-applications for almost 15 years, (ever since I
>first had the pleasure of hearing Ted Nelson speak in 1977.)
> 
>Please keep in mind that these are 'front end' issues.  They should not
>affect the manner in which documents are stored.
> 
>------------------
>--


Mike

From connolly@pixel.convex.com  Mon Jun 22 21:39:12 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA21341; Mon, 22 Jun 92 21:39:12 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA19580; Mon, 22 Jun 92 21:39:28 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA12765; Mon, 22 Jun 92 14:38:42 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA17202; Mon, 22 Jun 92 14:38:27 -0500
Message-Id: <9206221938.AA17202@pixel.convex.com>
To: raisch@cthulhu.control.com (Robert Raisch)
Cc: jfg@dxcern.cern.ch, www-talk@nxoc01.cern.ch
Subject: Re: Links, Types and Documents (Third time's a charm) 
In-Reply-To: Your message of "Mon, 22 Jun 92 13:30:55 EDT."
             <9206221730.AA04308@control.com> 
Date: Mon, 22 Jun 92 14:38:26 CDT
From: Dan Connolly <connolly@pixel.convex.com>


><!-- 
>
>  I first posted this some weeks ago on the 'www-interest' list, and 
>  received only one reply, (complementing me on my reference to Rexx.)
>
I keep my copy of this posting handy and I check lots of ideas against it.

>  I really had hoped that this post would start an interesting discussion
>  on the topics I address, specifically the ideas of 'attention links' 
>  'user documents' and 'transparent documents'.
>
I don't have any well defined systems (implementations or just specs.)
that address these issues. WWW is headed in this direction, but it's
a long way off.
 
>  Are these ideas so obvious that they merit no discussion whatsoever?
>
No, it's more like they're so novel we haven't thought seriously enough
to comment yet.

><Query>
>One of the missing pieces here is the ability of creating new h-texts, and
>adding new links to old h-texts.
>
>Hypertext, and like systems, are of limited use if they do not support
>collaboration.  I feel that this is a VERY important point.
>
>When might we expect extensions to WWW that support collaboration?
></Query>

You might look to Andrew for a more mature system in these regards.
While I think Andrew is a great breeding ground for ideas, I think
the resulting technology is too off-beat (for example, they implemented
their own object-oriented C preprocessor, and now C++ has come along
and writing Andrew code looks like a pain in comparison).

I'm toying with the idea of a FrameMaker inset editor to interface
Frame's direct-manipulation editing capabilities with global
hypertext on the internet. I shouldn't even mention it until I have
some sort of implementation, but it's an idea...

>I have a few recommendations regarding new link types in WWW.  This is based
>on thinking about hyper-applications for almost 15 years, (ever since I 
>first had the pleasure of hearing Ted Nelson speak in 1977.)
>
>Please keep in mind that these are 'front end' issues.  They should not
>affect the manner in which documents are stored.
>
Well, we should be careful not to store documents in a way that
conflicts with these useful concepts.


>------------------
>
>There are 4 'minimal' link types which, I believe, a true hypertext applicatio
n
>*must* support.
>
>	1.	Replacement
>			-- when activated, replaces the current document
>			   with a new document, (this is what WWW offers
>			   today).
>
FrameMaker: gotolink
GNU Info: menus, notes
WWW: <A HREF=...>
EBT: link

>	2.	Annotation
>			-- when activated, overlays a new document on the
>			   current document, partially obscuring the original.
>			   (An annotation must be dismissed by the reader.)
>
FrameMaker: openlink
GNU Info: n/a
WWW: n/a
EBT: link window=new

>	3.	Inclusion
>			-- when the document is created, elements from other
>			   documents are collection to be included in the
>			   representation of the current document.  (Quotes)
>			   (This is a non-interactive link.  The user does
>			    not activate this link. It is activated before 
>			    the document is presented to the user.)
>
FrameMaker: import by reference (bitmapped graphics ONLY)
GNU Info: n/a
WWW: n/a
EBT: n/a

>	4.	Expansion
>			-- when activated, new information is added to the 
>			   current document, expanding the original scope.
>
FrameMaker: conditional text
GNU Info: n/a
WWW: n/a
EBT: change stylesheets so that HIDE property changes

>
>There are 3 further types which I believe are necessary to complete the
>function paradigm.  (Of particular interest is the 'attention link'.)
>
>
>	6.	Execution
>
>			-- when activated, some arbitrary function is performed
.
>			   The point that was mentioned about the lack of an
>			   ubiquitious scripting language is well made.  Lisp
>			   is too arcane for most.  Shell languages are too
>			   platform specific.  What is needed is a simple
>			   to understand, freely available scripting platform.
>			   Although I hesitate to mention it, REXX might be
>			   a reasonable choice due to it's broad availability.
>
Ah... if you want commentary, state an arguable thesis. No one can argue
against a platitude like "What is needed is a simple to undertand,
freely available scripting platform." I vote for some brand of Lisp, perhaps
XLisp or ELK.

But there's a larger issue: should documents be turing machines? Using SGML,
it is a well defined problem to determine whether a document is valid. As
soon as we allow documents to be programs (like TeX, nroff, or Lisp), we
run into the halting problem and we lose any hope of converting documents
from one representation to another. If a document is a program that, when run,
conveys its content, then we lose the ability to use that content in any
other way than the author originally intended.

>	5.	Attention   (a specialisation of the Execution type)
>
>			-- when the current document is modified (a link is
>			   added, or removed, or the document is merely read)
>			   a message is sent to the 'owner' of the attention
>			   link.  This message creates a new link in the 'user
>			   document' of the individual who placed the attention.

Hmmm... I need a clear explanation of the underlying model here. In the
model in my mind, a "document" is never modified. But the functionality
you describe is interesting. Certainly we want to be able to collect
usage statistics.

>	7.	Collection  (a non-local specialisation of the Execution type)
>
>			-- when activated, a collection link leaves the current
>			   document, and 'travels' the docuverse, in search of
>			   other documents which satisfy it's internal criteria.
>
>			   This is the concept of a 'knowbot'.
>
It looks like a query to me. I need either 1) a good definition of the
capabilities of a knowbot, or 2) an implementation of a knowbot (any
sort of hack will do) to get a feel for the functionality. Until
then, it's just a very vague idea. Fortunately, there are some
implementations of this idea: cron/find, WAIS, USENET news (kill files, etc.)

>
>	Transparent Documents  --
>
>		a transparent document is one which a user creates locally,
>		and that is a new representation of an existing document.
>
>		Transparent documents are used to create new local links on
>		a document which I do not have permission to modify.
>
>		Transparent documents can then be made available to others,
>		(published) just as a "regular" document is, thus facilitating
>		the creation of new works from old.
>
This looks like a local copy of a document to me. No?

>	User Documents --
>
>		a user document is where I keep my "bookmarks", links to
>		local documents, links to messages from others, links to
>		my "attention" links, (see below).  User documents are where
>		we, as navigators of the docuverse, are defined as individuals.
>
>		They are also where we can keep links to other user documents
>		which have been permitted to view/modify my own local documents
.
>
>		Another function of the User document is to collect users into
>		an abstract group. (Thus, based on my membership in user 
>		document 'Research Group', I am permitted access to materials
>		'owned' by that group. Of course, messages sent to an abstract
>		group then become available to all members of that group.)
>
>		(Please note that a User Document is nothing more or less than
>		 a collection of links, (as all documents are).)
>
Now we've opened up the whole PIM can of worms. Current implementations
include mail user agents (MH, Elm), news readers (with their .newsrc and
kill files, etc.) wais-questions, WWW home documents. I haven't looked
at the hyperbole model, but I understand it addresses this issue at length.

>So.....
>
>	Scenerio:
>
I'd like to see how a MIME user agent would satisfy this scenario...

>		I start my session with my hypertext-application, and open 
>		my user document.
>
I start my MIME UA.

>		I notice that 17 of my attention links have been activated 
>		in the last day.
>
There are 17 mail messages (with certain tell-tale headers) in my inbox.

>		I select the most interesting and activate the link which
>		it created in my personal user document.
>
I read the message. It's a message/external-body type message that points
to an article in a USENET database at this site.

>		I am now reading an article which I previously linked
that is, I had saved the article by creating a message/external-body
type message in my mail box.

>		, and
>		see that an annotation which I made some time ago
i.e. my followup article

>		 has been
>		added to, by a colleague.
>
i.e. has been followed-up.

>		The comments are pertinant to my current work, so I create
>		a new local 'transparent' document to mirror the original 
>		work.  (Or use the 'transparent' document I may have created
>		previously.)
>
I just save a reference to the news artile, as above, in a message/external-body
type message.

>		On this new document, I make a few new annotations

i.e. I follow up to the document. It would be nice to be able to do
some direct-manipulation style annotation to articles, ala FrameMaker.

>		and decide
>		to made this new work available to the research group of which
>		I am leader.  I place a link to it in the user document which
>		represents my working group.
>
I mail a message/external-body style reference to the thread to the
alias that represents my working group.

I really think that Internet Mail, Usenet News, and WAIS could be
a great platform for CSCW. A MIME user agent that could make
WAIS and NNTP queries and act as a FrameServer client would
be a great start. If I have time, I'll try to cook something up.

Dan

From HARMO@valt.helsinki.fi  Tue Jun 23 10:01:10 1992
Return-Path: <HARMO@valt.helsinki.fi>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA22326; Tue, 23 Jun 92 10:01:10 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA29759; Tue, 23 Jun 92 10:01:30 +0200
Received: from charon-gw.pc.Helsinki.FI by kruuna.helsinki.fi with SMTP id AA04453
  (5.65c/IDA-1.4.4 for <www-talk@nxoc01.cern.ch>); Tue, 23 Jun 1992 11:00:35 +0300
Received: From HYLKN1/WORKQUEUE by charon-gw.pc.Helsinki.FI
          via Charon 3.4 with IPX id 100.920623110021.448;
          23 Jun 92 11:00:49 +0200
Message-Id: <MAILQUEUE-101.920623105426.28@valt.Helsinki.FI>
To: jfg@dxcern.cern.ch, www-talk@nxoc01.cern.ch
From: "Timo Harmo - SocSci U of Helsinki"  <HARMO@valt.helsinki.fi>
Date:     23 Jun 92 10:54:27 EET
Subject:  Re: Links, Types and Documents (Third time's a charm) 
X-Pmrqc:  1
X-Mailer: Pegasus Mail v2.2 (R3).

> > 3.  Inclusion
> >             not activate this link. It is activated before
> >             the document is presented to the user.)

Is this something that could be done by the server?

> > 6.  Execution
> freely available scripting platform." I vote for some brand of
>Lisp, perhaps  XLisp or ELK.
How about a simple Prolog? Prolog is quite hypertextish and relatively
easy to implement (I suppose).
 -Timo


From bridges@nas.nasa.gov  Tue Jun 23 19:03:22 1992
Return-Path: <bridges@nas.nasa.gov>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA25071; Tue, 23 Jun 92 19:03:22 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA07321; Tue, 23 Jun 92 19:03:46 +0200
Received: by orville.nas.nasa.gov (5.61.a/1.34)
	id AA21186; Tue, 23 Jun 92 10:03:15 -0700
Date: Tue, 23 Jun 92 10:03:15 -0700
From: bridges@nas.nasa.gov (Michael D. Bridges)
Message-Id: <9206231703.AA21186@orville.nas.nasa.gov>
To: www-talk@nxoc01.cern.ch
Subject: add me to mailing list


Please add me to your www-talk mailing list.

                                Thank you,
                                                Mike Bridges
                                                NASA/AMES
                                                bridges@nas.nasa.gov


From koellner@lbl.bitnet  Tue Jun 23 19:13:52 1992
Return-Path: <koellner@lbl.bitnet>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA25120; Tue, 23 Jun 92 19:13:52 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA07876; Tue, 23 Jun 92 19:14:18 +0200
Received: from CEARN.cern.ch by CEARN.cern.ch (IBM VM SMTP V2R1)
   with BSMTP id 2560; Tue, 23 Jun 92 19:13:30 SET
Received: from Lbl.Bitnet by CEARN.cern.ch (Mailer R2.07B) with BSMTP id 7452;
 Tue, 23 Jun 92 19:13:30 SET
Date:    Tue, 23 Jun 92 10:13:02 PDT
From: koellner@lbl.bitnet (Werner Koellner, LBL)
Message-Id: <920623101302.23e16263@csa3.lbl.gov>
Subject: WWW (Viola) question
To: www-talk@nxoc01.cern.ch
X-St-Vmsmail-To: ST%"www-talk@info.cern.ch"

                                         LBL, Physics Division, 23-JUN-1992

    Hi,
         what might be the cause of the following error message when I
         start up WWW?

    Unknown character: 43WWW: home = /home/ux5/ux5c/phyd/cern/WWW/lbl.html

                  Best regards,

                                   Werner

From jfg@dxcern.cern.ch  Tue Jun 23 20:34:17 1992
Return-Path: <jfg@dxcern.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA25415; Tue, 23 Jun 92 20:34:17 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA10610; Tue, 23 Jun 92 20:34:39 +0200
Received: by dxcern.cern.ch (5.57/Ultrix3.0-C)
	id AA25813; Tue, 23 Jun 92 20:34:11 +0200
Date: Tue, 23 Jun 92 20:34:11 +0200
From: jfg@dxcern.cern.ch (Jean Francois Groff)
Message-Id: <9206231834.AA25813@dxcern.cern.ch>
To: koellner@lbl.bitnet (Werner Koellner, LBL)
Cc: www-talk@nxoc01.cern.ch
Subject: Re: WWW (Viola) question
References: <920623101302.23e16263@csa3.lbl.gov>

	Werner,

  This may be a problem either with Viola, or with the home document
you're trying to read. Try it with the line-mode browser:

	www /home/ux5/ux5c/phyd/cern/WWW/lbl.html

and see if it works. Did you create the lbl.html yourself ? In any
case, you can mail it to me (privately to not bother the list) and
I'll tell you if there's something wrong with it. Are you trying to
set up a WWW server about LBL ? That would be great !

--
  Jean-Francois Groff (jfg@info.cern.ch)
  World-Wide Web initiative
  CERN, ECP division, CH-1211 Geneva 23, Switzerland
  Phone +41 22 767 3755 -- Fax +41 22 767 7155
--
"Life may at times be boring, but is it more fun to be dead ?"
                                                  -- Alcor


From davis@willow.tc.cornell.edu  Wed Jun 24 17:47:51 1992
Return-Path: <davis@willow.tc.cornell.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA28638; Wed, 24 Jun 92 17:47:51 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA02889; Wed, 24 Jun 92 17:48:17 +0200
Received: by willow.tc.cornell.edu (4.1/SMI-4.1)
	id AA13062; Wed, 24 Jun 92 11:49:48 EDT
Date: Wed, 24 Jun 92 11:49:48 EDT
From: davis@willow.tc.cornell.edu (Jim Davis)
Message-Id: <9206241549.AA13062@willow.tc.cornell.edu>
To: www-talk@nxoc01.cern.ch
Subject: logical types of links (Re:  Links, Types and Documents)

I want to take issue with the set of links proposed by R Raisch.  They
are very intriguing and I praise him for bringing them up, while also
apologizing for not responding to them sooner.  This is the first
of several letters addressing some problems I see in his proposal
and some general issues for WWW and hypertext which are raised
by his list.

In this message, I want to discuss the logical types of links.  It
seems to me that hyperlinks could encoding knowledge of several
different types.  Three that come to mind are:

1 structural - relations between documents, e.g. document X cites
document Y; word X is defined in document Y; possibly extending to
the kind of argumentation links used in e.g. gIBIS, NoteCards (e.g.
claim X is refuted by argument Y)

2 presentation - the user should see X and Y at same time if possible.

3 construction - to build document X, include files Y and Z, and
run procedure W at display time.

It is not obvious to me that these three should be treated in the
same way.  In particular, I doubt the need for presentation information.
The lesson of SGML is that documents should be marked with logical
structure, leaving presentation out.

It also seems to me that Raisch's list mixes several types, and
this is undesirable.

For example, I see the "annotation" link as encoding presentation
information.  It differs from "replacement" only in that you see the
original and the target at the same time.  (At least, that's how he
describes it.  One could also define it as a kind of logical relation
- it is a comment on the target.  But that's a different subject.)

A second example is that one might want to use an "Execution" link (or
its subtype "collection") to implement any of the first four types
(replacement, annotation, inclusion, expansion).

To continue the discussion, we might consider whether there are
other logical types for links, how to decompose Raisch's links 
into these categories, and whether to encode presentation in links.

Best wishes

From davis@willow.tc.cornell.edu  Wed Jun 24 17:53:39 1992
Return-Path: <davis@willow.tc.cornell.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA28667; Wed, 24 Jun 92 17:53:39 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA03527; Wed, 24 Jun 92 17:54:04 +0200
Received: by willow.tc.cornell.edu (4.1/SMI-4.1)
	id AA13109; Wed, 24 Jun 92 11:55:33 EDT
Date: Wed, 24 Jun 92 11:55:33 EDT
From: davis@willow.tc.cornell.edu (Jim Davis)
Message-Id: <9206241555.AA13109@willow.tc.cornell.edu>
To: www-talk@nxoc01.cern.ch
Subject: Re:  Links, Types and Documents

In considering R. Raisch's link types, it seems to me that the
expansion and replacement can be combined and generalized.  In both
cases portion of the document is replaced.  In expansion case, the
condensed text is replaced (perhaps temporarily) with the enlarged
text, in replacement the entire document is replaced with some other.
So the more general case is a link which labels a range in both the
source and target document.  The replacement link labels the entire
document, the expansion labels only a portion.

As for execution links in general, I see Dan's point that adding
procedures to documents makes them difficult to manipulate.  And N
Borenstein has pointed out (in Mime) the dangers of executing
arbitrary code.  But I still see that there are functions which only
code can provide, and so we just face the challenge of designing a
language that is both powerful and safe, perhaps sacrificing Turing
completeness.  Is Atomicmail [tm?] such a language?


From davis@willow.tc.cornell.edu  Wed Jun 24 17:58:51 1992
Return-Path: <davis@willow.tc.cornell.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA28710; Wed, 24 Jun 92 17:58:51 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA03786; Wed, 24 Jun 92 17:59:18 +0200
Received: by willow.tc.cornell.edu (4.1/SMI-4.1)
	id AA13167; Wed, 24 Jun 92 12:00:50 EDT
Date: Wed, 24 Jun 92 12:00:50 EDT
From: davis@willow.tc.cornell.edu (Jim Davis)
Message-Id: <9206241600.AA13167@willow.tc.cornell.edu>
To: www-talk@nxoc01.cern.ch
Subject: Links that refer to a range of text, not just a point.

In my last piece of mail (I hope they arrive in order) I tried to
generalize replacement and expansion links.  But this way of
describing links presupposes that links have a range in the source
document.  You need this (or is this obvious?) e.g. because you must
know whether an annotation applies to the entire document, a section,
or just one word.  Likewise for replacements, you need to know how
much to replace.

But as far as I know, in most (all?) hypertext systems, the origin and
destination of links are points.  Certainly this is true in WWW, e.g.
The destination of a link is a position in a document, never a section
of a document.  On the other hand, in some sense WWW's Anchors do label
a region of the origin, since the anchor has an explicit beginning
and end.

But then this raises another issue: does WWW allow anchors within
anchors?  I think not - in which case I could not use WWW anchors to
both label a paragraph (e.g. for attaching an annotation) and a word
within it (e.g. for definition).  This worries me quite a bit.  Nor
can I attach multiple links to the same point (e.g. definitions of a
word in multiple languages).


From davis@willow.tc.cornell.edu  Wed Jun 24 18:00:30 1992
Return-Path: <davis@willow.tc.cornell.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA28723; Wed, 24 Jun 92 18:00:30 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA03892; Wed, 24 Jun 92 18:00:54 +0200
Received: by willow.tc.cornell.edu (4.1/SMI-4.1)
	id AA13181; Wed, 24 Jun 92 12:02:25 EDT
Date: Wed, 24 Jun 92 12:02:25 EDT
From: davis@willow.tc.cornell.edu (Jim Davis)
Message-Id: <9206241602.AA13181@willow.tc.cornell.edu>
To: www-talk@nxoc01.cern.ch
Subject:  Raisch's Attention link
Cc: 
Mime-Version: 1.0
Content-Type:  text/richtext
Content-Transfer-Encoding:  7bit

<paragraph>
In a previous message, R. Raisch proposed a number of interesting
link types.  One of them was the attention link, which I want to
discuss here.  I am uncertain about the need for this link, the
technical ability to provide it, and the definition.

<nl>
<paragraph>
Let me discuss the function first.  The function seems to be
providing a means of notifying an "owner" of a document when certain
conditions obtain.  The conditions Raisch mentions are:<nl>
<indent>
 1) someone has read the document<nl>
 2) someone has modified the document<nl>
</indent>

<nl><paragraph>
There is some doubt in my mind about read notifications.  I am not sure I
want other people to know what I read.  I can see only two reasons
for this feature:<nl>
<indent>
 1) as a form of "proof of delivery".  Some email systems provide
this, but I don't like it.<nl>
 2) as a means of collective revenue (pay-per-read)<nl>
</indent>

<nl><paragraph>
As for charging, fairness requires that the link not be activated
until I have seen a warning (otherwise I might get charged a zillion
dollars to read the document - just like 900 phone numbers in the
USA).  So this will add complexity to the client.

<nl><paragraph>
Also, attention links are not sufficient for a charging.  They
support a model where I am charged once per read, no matter how much
of the document I read.  But it seems likely that there might be need
for other charging models.  It is also unclear that they are technically
sufficient.  For economic purposes, you would want the message to be sent
by the server, not the client.  But this would cease working if the
document were copied.  (But maybe this is not a fair objection, since
I know of no scheme that can preserve property rights given the
possibility of perfect digital  copying.)

<nl><paragraph>
Continuing on the more general question of read notification,
regardless of purpose, it is possible that one might desire
notification on a finer grain that the entire document.  But this, I
think, requires the cooperation of the client.  Indeed, the client
can tell the server that a given piece of text has been displayed,
but not whether the user actually read it (unless we go further, and
implement it with an executable function which requires the user to
click on a button)

<nl><paragraph>
As for the second form (modification notification), it seems to me
that there is a need to inform not just the owner, but also other
people.  There are two reasons for this:  

<nl><paragraph>
First, as owner of a document, I am not likely to allow other people
to modify my document at all.  On the other hand, I might be
interested in notifications when someone adds or deletes a link <bold>to</bold>
my document.  But attention links don't address this problem.

<nl><paragraph>
Second, as a reader of a document I don't own, I might want to be
notified when the owner modifies it, since I might wish to re-read it
(or at least the changed sections.)  Let's call these "monitor"
links.  Monitor links might be a useful means of reducing effort
required for some kinds of network retrievals - those where I am
interested in new developments in certain areas.  Now, instead of
polling documents to see whether they've changed, I can just leave an
"attention link" and get a notification.

<nl><paragraph>
On the other hand, this has some problems.  One of them is the
question of who pays the cost of sending all these notifications.
Can you imagine the load on your workstation as it sends out 10,000
monitor notifications?  Perhaps this can be answered by bringing in
more economics - that is, to attach a monitor link I need to set up
an account such that I can be charged for the delivery.  Or maybe the
notifications are sent by the document server, so as an author I am
not affected.

<nl><paragraph>
A second problem (or at least issue) is that monitor links require
finer grain of size and time.  Some users will want to monitor only
select portions of a document.  Likewise, we may not want
notifications sent when <italic>any</italic> editing is made, but
rather only when the author completes a session.  That is, if I edit
the document for a day, saving changes six times, you don't want six
notifications.  This might require some notion of "transactions" such
as used in data bases.  

<nl><paragraph>
Finally though, I don't think it is correct to call these things
links.  They are not related to logical structure, nor are they
explicitly activated by the reader (or writer), indeed that person
not even be aware that it was activated.  Consider the more general
question - if attention links are a subcase of execution links, and
attention links can be activated without knowledge of the user,
should all  execution links be capable of such activation?  Do you
want to read documents which can cause arbitrary computations to
occur without your choice, or even knowledge?


From connolly@pixel.convex.com  Wed Jun 24 19:16:14 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA28939; Wed, 24 Jun 92 19:16:14 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA09163; Wed, 24 Jun 92 19:16:36 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA29352; Wed, 24 Jun 92 12:15:55 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA10088; Wed, 24 Jun 92 12:15:53 -0500
Message-Id: <9206241715.AA10088@pixel.convex.com>
To: davis@willow.tc.cornell.edu (Jim Davis)
Cc: www-talk@nxoc01.cern.ch
Subject: Re: Links that refer to a range of text, not just a point. 
In-Reply-To: Your message of "Wed, 24 Jun 92 12:00:50 EDT."
             <9206241600.AA13167@willow.tc.cornell.edu> 
Content-Type: multipart/mixed; boundary="8<--"
Mime-Version: 1.0
Date: Wed, 24 Jun 92 12:15:52 CDT
From: Dan Connolly <connolly@pixel.convex.com>

--8<--

>But then this raises another issue: does WWW allow anchors within
>anchors?  I think not - in which case I could not use WWW anchors to
>both label a paragraph (e.g. for attaching an annotation) and a word
>within it (e.g. for definition).  This worries me quite a bit.  Nor
>can I attach multiple links to the same point (e.g. definitions of a
>word in multiple languages).
>
This and other related questions (can I have lists within lists?)
are precisely the reason for using a well-defined structural markup
language governed by SGML processing rules.

Right now we have no DTD for HTML, and the only answers lie in
the browser source code. The documentation "in the web" is too
vague. But I hardly think we want the browser source code to
be the definition of HTML.

On the other hand, I tried and failed to come up with a DTD which
described HTML in such a way that the existing documents are legal.

Before this situation gets out of hand, we need to establish a
(possibly evolving, but at least exsting) SGML DTD for HTML.
This will require incompatible changes to the definiton of HTML
(for instance, the PLAINTEXT, XMP, and LISTING features of HTML
don't quite fit into SGML).

Enclosed* is a proposed DTD, a perl hack to patch existing files,
and a sample patched file. I invite:

= SGML experts to round out the DTD (should we include
  stuff from the ISO General DTD? the AAP article DTD?
  ISOnum, ISOpub etc. standard entities? How about
  using the QWERTZ LaTeX-like DTD?)

= SGML non-experts to become conversant in SGML (it's
  coming whether you like it or not)

= HyTime experts to add to the confusion. Seriously, I'd
  like to know what HyTime has to offer.

= DSSSSSSSSSSL experts to do the same

= the WWW team to adapt existing code to match this DTD
  (or some real DTD)

= HTTP server sites to then update their HTML files


	*encosed in the MIME multipart/mixed enclosure sense

--8<--

<!-- This DTD was produced by DeveGram on Tue Jun  2 18:58:16 1992 -->
<!-- and hand-edited by connolly@convex.com -->

<!-- typical usage:

  <!DOCTYPE web-node SYSTEM 
    [
    <!ENTITY UDI011 SDATA
      "http://info.cern.ch/hypertext/DataSources/NewsFromVM/Overview.html">
    ]>
 -->

<!--     Parameter Entities       -->

<!--      Terminal symbols        -->

<!ENTITY % words "#PCDATA" >

<!--    Non-ELEMENT symbols       -->

<!ENTITY % inline       "%words | A" >
<!ENTITY % text         "%inline | P | IMAGE" >
<!ENTITY % heading "H1|H2|H3|H4|H5|H6" >

<!ENTITY lt "<">
<!ENTITY gt ">">
<!ENTITY amp "&">

<!ENTITY lt. "<">
<!ENTITY gt. ">">
<!ENTITY amp. "&">

<!--     Document structure       -->

<!ELEMENT WEB-NODE      O O  (TITLE, NEXTID?, ISINDEX?, section+, ADDRESS?)>

<!ELEMENT TITLE - -  (%inline)+>
<!ELEMENT ADDRESS - - (%text)+>

<!ELEMENT NEXTID - O EMPTY >
<!ATTLIST NEXTID N NUMBER #IMPLIED>

<!ELEMENT ISINDEX - O EMPTY >


<!ELEMENT section O O ((%heading)?,
                        (
                        %text |
                        section |
                        MENU |
                        UL |
                        OL |
                        DIR |
                        DL)+)>

<!ELEMENT (H1|H2|H3|H4|H5|H6)   - -  (%inline) >

<!ELEMENT P     - O  EMPTY -- paragraph SEPARATOR -->

<!ELEMENT IMAGE - O EMPTY>
<!ATTLIST IMAGE DATA ENTITY #REQUIRED>

<!ELEMENT A     - -  (%inline)+>
<!ATTLIST A
        NAME CDATA #IMPLIED
        HREF ENTITY #IMPLIED
        TYPE CDATA #IMPLIED --@@-- >

<!ELEMENT MENU  - -  (LI+)>

<!ELEMENT UL    - -  (LI+)>

<!ELEMENT OL    - -  (LI+)>

<!ELEMENT DIR   - -  (LI+)>

<!ELEMENT LI    - O  (%text)+>

<!ELEMENT DL    - -  ((DT, DD)+)>

<!ELEMENT DT    - O  (%inline)+>

<!ELEMENT DD    - O  (%text)+>

--8<--

And here's a perl script that attempts to patch up existing
HTML files:

--8<--

#!/usr/local/bin/perl
#
# USE
#   fix-html.pl <W3-file.html >W3-file.sgml
#
# SEE ALSO
#   the web-node.dtd.
#

print "<!DOCTYPE WEB-NODE SYSTEM \n[\n";

@html = <>;                     # read whole file
$_ = join('', @html);
$out = '';

$header = 0;
$anchor = "UDI000";
while(/</){
    $out .= $`;
    $_ = $';
    if(s/^A\s+//i){
        &fix_anchor;
    }elsif(s/^NEXTID\s+(\d+)\s*>//){
        $out .= "<NEXTID N=$1>";
    }elsif(s/^H(\d)>//){
        local($n) = $1;
        while($n<=$header){ $out .= "</SECTION>"; $header--; }
        while($n>$header){ $out .= "<SECTION>"; $header++; }
        $out .= "<H$n>";
    }else{
        $out .= '<';
    }
}

$out .= $_;

foreach(keys %anchor){
    local($ent) = $anchor{$_};

    print "<!ENTITY $ent SDATA \"$_\">\n";
}

print "]>\n", $out;

sub fix_anchor{
    local($name, $href, $type);

    # What exactly is the syntax of an SGML attribute value?
    while(s/^(\w+)\s*=\s*((\"[^\"]*\")|([^\s>]+))\s*//){
        local($v) = ($3 || $4);
        local($a) = $1;
        $href = $v if $a =~ /^href$/i;
        $name = $v if $a =~ /^name$/i;
        $type = $v if $a =~ /^type$/i;
    }
    s/[^>]*>//;

    $out .= "<A";
    $out .= " NAME=\"$name\"" if $name ne '';
    $out .= " TYPE=\"$type\"" if $type ne '';
    if($href ne ''){
        if(!defined($anchor{$href})){
            $anchor{$href} = ++$anchor;
        }
        $out .= " HREF=" . $anchor{$href};
    }
    $out .= ">";
}

--8<--

Here's my default.html run through the above script:

--8<--

<!DOCTYPE WEB-NODE SYSTEM 
[
<!ENTITY UDI011 SDATA
"http://info.cern.ch/hypertext/DataSources/NewsFromVM/Overview.html">
<!ENTITY UDI006 SDATA "http://crnvmc.cern.ch./FIND">
<!ENTITY UDI020 SDATA "http://info.cern.ch/rpc/doc/User/UserGuide.html">
<!ENTITY UDI013 SDATA
"http://info.cern.ch/hypertext/DataSources/bySubject/Overview.html">
<!ENTITY UDI021 SDATA "http://otax.tky.hut.fi/tky/default.html">
<!ENTITY UDI017 SDATA
"http://info.cern.ch:8001/archive.orst.edu:9000/archie-orst.edu">
<!ENTITY UDI010 SDATA "http://crnvmc.cern.ch/NEWS/student">
<!ENTITY UDI019 SDATA
"http://info.cern.ch./hypertext/Products/WAIS/Sources/Overview.html">
<!ENTITY UDI002 SDATA "http://info.cern.ch/hypertext/WWW/TheProject.html">
<!ENTITY UDI007 SDATA "http://crnvmc.cern.ch/NEWS/?">
<!ENTITY UDI001 SDATA "QuickGuide.html">
<!ENTITY UDI012 SDATA
"http://info.cern.ch/hypertext/DataSources/News/Overview.html">
<!ENTITY UDI003 SDATA "http://crnvmc.cern.ch./WHO">
<!ENTITY UDI022 SDATA
"gopher://gopher.micro.umn.edu:70/11/Other%20Gopher%20and%20Information%20Servers">
<!ENTITY UDI005 SDATA "http://crnvmc.cern.ch./FIND/jaune?">
<!ENTITY UDI004 SDATA "http://crnvmc.cern.ch./FIND/yellow?">
<!ENTITY UDI016 SDATA "http://crnvmc.cern.ch/FIND/DESY?">
<!ENTITY UDI009 SDATA "http://crnvmc.cern.ch./NEWS/vmnews">
<!ENTITY UDI008 SDATA "http://crnvmc.cern.ch./NEWS/cern">
<!ENTITY UDI023 SDATA
"http://info.cern.ch./hypertext/WWW/LineMode/Defaults/default.html">
<!ENTITY UDI015 SDATA "http://slacvm.slac.stanford.edu./FIND/spires">
<!ENTITY UDI018 SDATA "http://iicm.tu-graz.ac.at./jargon">
<!ENTITY UDI014 SDATA "http://info.cern.ch./hypertext/DataSources/Overview.html">
]>
<TITLE>CERN Information</TITLE>
<NEXTID N=10>
<SECTION><H1>CERN Information - Select by number</H1>
<DL>
<DT><A NAME="0" HREF=UDI001>Help</A>
<DD>On this program, or the
<A HREF=UDI002>World-Wide Web project</A>.
<DT><A NAME="2" HREF=UDI003>Phone book</A>
<DD>People, phone numbers, accounts and email addresses.
See also the analytical
<A NAME="yellow" HREF=UDI004>Yellow Pages</A>, or
the same index in French :
<A NAME="jaune" HREF=UDI005>Pages Jaunes</A>.
<DT><A NAME="1" HREF=UDI006>"XFIND" index</A>
<DD>Index of computer centre documentation, newsletters, news,
help files, etc...
<DT><A NAME="groups" HREF=UDI007>News</A>
<DD>A complete list of all public CERN news groups, such as
<A NAME="3" HREF=UDI008>news from the CERN User's
Office</A>,<A NAME="4" HREF=UDI009>
CERN computer center news</A>,<A HREF=UDI010>
student news</A>. See also <A NAME="5" HREF=UDI011>private
groups</A> and <A NAME="inews" HREF=UDI012>Internet
news</A>.
</dl>
<SECTION><H2>From other sites</h2>
See online data by
<A NAME="subject" HREF=UDI013>subject</A>,
pointers to
<A HREF=UDI014>other forms of online data</a>, and the following specific databases:
<DL>
<DT><A NAME="spires" HREF=UDI015>SLAC SPIRES</A>
<DD>The High Energy Physics preprint index at Stanford Linear
Accelerator, California.
(This is the same information avialable via the QSPIRES facility on BITNET.
Include the word "FIND" as the first keyword, eg: K FIND AUTHOR FRED.).
<DT><A NAME="desy" HREF=UDI016>DESY documents</a>
<DD>Documents and help files from the DESY lab in Hamburg.
<DT><A NAME="archie" HREF=UDI017>
Archie</a>
<DD>An index of almost everything available by "anonymous FTP".
<DT><A NAME="7" HREF=UDI018>Hacker Jargon</a>
<DD>An index to a cross-referenced set of hacker terms. A demonstration
of the WWW gateway to the Graz Technical University Hyper-G database.
<DT><A NAME="9" HREF=UDI019>W.A.I.S.</a>
<DD>All kinds of information available from "Wide Area Information Servers".
<DT><A NAME="6" HREF=UDI020>CERN RPC</A>
<DD>The user guide for the RPC system developed in CERN CN division
(not Sun/RPC). This is an example of documentation (partially) converted
into hypertext.
<DT><A NAME="hut" HREF=UDI021>Helsinki</a>
<DD>Helsinki Technical University information service (Mostly Finnish).
<DT><A NAME="gopher" HREF=UDI022>Gophers</a>
<DD>Campus-wide information systems using "Gopher" software. (Requires
www version 1.1 or higher)
</DL>
(This page may be an out of date copy. See the
<A NAME="latest" HREF=UDI023>latest version</a>.)

--8<----



From connolly@pixel.convex.com  Thu Jun 25 01:04:53 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA29544; Thu, 25 Jun 92 01:04:53 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA28907; Thu, 25 Jun 92 01:05:12 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA15721; Wed, 24 Jun 92 18:04:30 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA04497; Wed, 24 Jun 92 18:04:27 -0500
Message-Id: <9206242304.AA04497@pixel.convex.com>
To: davis@willow.tc.cornell.edu (Jim Davis)
Cc: www-talk@nxoc01.cern.ch
Subject: Re: Raisch's Attention link 
In-Reply-To: Your message of "Wed, 24 Jun 92 12:02:25 EDT."
             <9206241602.AA13181@willow.tc.cornell.edu> 
Date: Wed, 24 Jun 92 18:04:27 CDT
From: Dan Connolly <connolly@pixel.convex.com>


><paragraph>
>In a previous message, R. Raisch proposed a number of interesting
>link types.  One of them was the attention link, which I want to
>discuss here.  I am uncertain about the need for this link, the
>technical ability to provide it, and the definition.
>
[side note: Jim: what did you use to compose richtext? Ez?]

While all these ideas are very interesting, I sometimes get
lost in a haze of ideas and scenarios. I'd like to get a handle
on just what application we're developing here.

The pie-in-the-sky seems to be a distributed, hypermedia CSCW
(computer supported collaborative work) platform, where multiple authors
discover, research, read, reply, refute, annotate, author, and otherwise
exchange information.

Features of such a system might include:
- distributed access to documents, i.e. a document can be processed
	by a client on another host
- fulltext searching of large bodies of information
- direct manipulation query, i.e. point-and-click at the information
	that you're interested in
- hypertext, i.e. random-access both within documents
	and between documents
- direct manipulation editing, i.e. editing a representation of
	the end product rather than the source format (WYSIWYG)
- multimedia documents (formatted text, raster images, line drawings,
	audio, video, structured enclosures)
- hypermedia (just a term for multimedia hypertext)

In order to refine the model behind such an application and explore
the functionality different models support, let's have a look at some
existing applications and an abstract of the model they present.

(terms introduced by _underscores_ are objects defined by the model.)


Internet mail: distributed text message interchange

With _user agent_, a user composes a _message_, which
is delivered by a _transfer agent_ to a _mailbox_, possibly in
another _domain_, where another user can read the message.
	MIME defines various _types_ of messages, including
	text, images, audio, video, included messages, and
	multipart (complex) types.
The recieving user may forward the message or reply to the message
by composing a new message including part or all of the original.


USENET news: distributed bulletin board

An _article_ is a special kind of message, addressed to
a _newsgroup_ rather than a _mailbox_. _Posting software_ enters the
article into a _repository_. A _news reader_ is a variant of a
mail user agent that helps a user navigate the articles in the
repository. It maintains a record of which articles the user
has read. (Transfer agents fit in somehow, and NNTP adds a client/server
separation between the news reader/poster and the repository).


WAIS: distributed fulltext search and multimedia retrieval

With a _wais client_, a user composes a _question_ consiting of
some relavent text and some _sources_ to consult. The wais client
contacts the _server_ for each source and makes a _query_ using the
relavent text. The servers respond with a scored list of _document
identifiers_. The user can select a document identifier and instruct
the client to _retrieve_ the corresponding _document_. The user
can then refine the question by adding relavent documents or _chunks_
of relavent documents.


FTP: distributed file exchange

A user invokes a _client_ on one _host filesystem_. The client
logs into a _server_ on another _host filesystem_ using a _username_
and a _password_. The user indicates a _mode_ and a _directory_
to the server through the client. The user can request that the
client display a _directory listing_ from the server, or transfer
a _named_ _file_ from the server's filesystem to the client's.


archie (prospero, really): a distributed file system

A user indicates a _host_, _search term_,
and _search mode_ to a _client_. The client contacts the _server_
on that host, issues a query, and displays the resulting list
of (host|dir|mode|date|seq|size|name|type) _items_.


Gopher: distributed hierarchical information browser

The _client_ connects to a _server_ (a _port_ on a _host_)
and sends it a _selector_. The server either sends back a _document_
or a _directory_, depending on the _type code_ corresponding to
the selector (the null selector is defined to be the root directory
of the server). A directory is a list of _items_. Each item consists
of a type code, a _name_, a selector, a host, and a port.

The client displays the names for the user, and the user chooses one.
The client sends the selector to the server indicated by the corresponding
host and port, and depending on the type code, 0) display the resulting
document, 1) display the resulting directory, 4-6) decode the resulting
archive  9) save the resulting document to a file. Codes 2, 7, and
8 request services from CSO phone book servers, fulltext gopher servers,
and telnet hosts respectively.

The client maintains a _stack_ of the directories the user has visited,
and the user can choose to "go back" in addition to choosing a
directory item.


WWW: a distributed hypertext browser

a WWW _client_ parses an _address_ into a _scheme_, a _server_,
a _path_, an _anchor id_, and some _search terms_. The client retrieves
a _document_ from the server using the path and search terms (using
one of several protocols), and displays the document, indicating
the _anchor element_ indicated by the anchor id. Documents contain
_structural elements_ such as headings, lists, etc.

The user requests the next document by either a) choosing one of the
anchor elements of the document (which specifies an address), or b)
if the document is an _index_, the user can request a search by supplying
search terms. The client combines the address of the document with the
search terms and begins again. Otherwise, the user can choose one of the
anchor elements and instruct the client to use the corresponding
address for the next document.


GNU Info: online hypertext documentation for applications and products

An info _browser_ displays the root _node_. A user indicates another node
to display by choosing _up_, _next_, _previous_, choosing a
_menu item_, or indicating a _note_ to follow.


Unix manual: structured text documentation for unix commands and fuctions

A user may request _formatted text display_ of a _page_ by its _name_,
and _section_, or they may request display of all _permuted index entries_
(containing names of pages) that match a _string_. The database is defined
on a per-user basis as a list of _trees_, each contatining a permuted
index and one or more sections containing one or more pages.


Frame: direct manipulation hypermedia editor, hypermedia browser

FrameMaker supports point-and-click editing of _documents_ composed
of _frames_ containing _objects_ (geometrics graphics and raster images)
and _textflows_ of _paragraphs_ of formatted text and _markers_, including
_link sources_ and _link destinations_.

FrameViewer displays a _page_ frame of a document, and allows point-and-click
access to the link sources. Every document has an implicit firstpage and
lastpage link destination. Every page has an implicit link to the next
and previous page.


Compare and contrast objects:

Mail messages vs. WAIS documents vs WWW documents vs FTP files
	(unique id's? writable?)
WAIS source vs. news group
WAIS server vs. NNTP server vs. gopher server vs HTTP server
Message-ID vs WAIS docid vs WWW UDI
WAIS chunk vs. WWW anchor vs. FrameMaker marker
Info node vs. FrameMaker page vs. WWW document

Compare and contrast features

- distributed access to documents
	mail: no
	news: NNTP allows retrieval of headers by newsgroup or date
	wais: 
- fulltext searching: WAIS vs. WWW vs. gopher
- direct manipulation query: xwais vs. xgopher vs. NeXT WWW
- hypertext: WWW vs. GNU Info vs. FrameMaker
- direct manipulation editing: FrameMaker vs. Andrew vs. Interviews
- multimedia documents: FrameMaker vs. Gopher vs. MIME vs. WAIS vs. WWW

I'm formulating a model that will hopefully put all these features
and objects into one framework for discussion, at least.

Dan

From fkappe@fiicmds04.tu-graz.ac.at  Thu Jun 25 14:26:08 1992
Return-Path: <fkappe@fiicmds04.tu-graz.ac.at>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA00925; Thu, 25 Jun 92 14:26:08 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA25968; Thu, 25 Jun 92 14:25:47 +0200
Received: by iicm.tu-graz.ac.at (5.57/Ultrix3.0-C)
	id AA25321; Thu, 25 Jun 92 14:25:18 +0200
Date: Thu, 25 Jun 92 14:25:18 +0200
From: fkappe@fiicmds04.tu-graz.ac.at (Frank Kappe)
Message-Id: <9206251225.AA25321@iicm.tu-graz.ac.at>
To: www-talk@nxoc01.cern.ch
Subject: HTML DTD and related problems (rather long)

>>>>> On Wed, 24 Jun 92 12:15:52 CDT, Dan Connolly <connolly@pixel.convex.com> said:

>>But then this raises another issue: does WWW allow anchors within
>>anchors?  I think not - in which case I could not use WWW anchors to
>>both label a paragraph (e.g. for attaching an annotation) and a word
>>within it (e.g. for definition).  This worries me quite a bit.  Nor
>>can I attach multiple links to the same point (e.g. definitions of a
>>word in multiple languages).
>
> This and other related questions (can I have lists within lists?)
> are precisely the reason for using a well-defined structural markup
> language governed by SGML processing rules.

> Right now we have no DTD for HTML, and the only answers lie in
> the browser source code. The documentation "in the web" is too
> vague. But I hardly think we want the browser source code to
> be the definition of HTML.

The situation with anchors in SGML is even worse. In addition to the cases where
you want to have anchors within lists, list items, headlines etc. and lists,
list items, headlines etc. within anchors, you also have anchors within anchors.
Actually, anchors within anchors look like this:

<BEGIN ANCHOR A>
!
!
!   <BEGIN ANCHOR B>
!   !
!   !
!   !
!   <END ANCHOR B>
!
!
<END ANCHOR A>

This case is trivial (you just allow anchors within anchors in the DTD).
However, consider the case where you want to have a destination anchor marking,
say, paragraphs 1 and 2, and another one marking 2 and 3:

<BEGIN ANCHOR A>
!
! para 1
!
!   <BEGIN ANCHOR B>
!                  !
! para 2           !
!                  !
<END ANCHOR A>     !
                   !
  para 3           !
                   !
    <END ANCHOR B>-+

This situation cannot be implemented with SGML Tags like <A ....>text</A>, as
it is proposed in HTML. Also, I doubt that it is possible to construct an anchor
spanning, e.g., a few items of list A and a few items of list B, because the
SGML parser would implicitly close openened anchor tags when reaching </list>:

<list A>
  <item>...
<A .....>
  <item>...
  <item>...
</list>

<list B>
  <item>...
</A>
  <item>...
  <item>...
</list>

The reason why it is possible to construct such things using the NeXT-based WWW
viewer/editor is simply that HTML is not SGML. Therefore it is impossible to
specify a DTD for HTML (as Dan has already pointed out).

In our Hyper-G system that uses HTF, a SGML-based format similar to HTML, we
overcome the anchor-nesting problem by specifying TWO tags for anchors: an
anchor-start (<AS>) tag and an anchor-end (<AE>) tag with an additional ID
attribute. So, the examples are coded like this:

<AS ID="A">
para 1
<AS ID="B">
para 2
<AE ID="A">
para 3
<AE ID="B">

and

<list A>
  <item>...
<AS ID="C">
  <item>...
  <item>...
</list>

<list B>
  <item>...
<AE ID="C">
  <item>...
  <item>...
</list>

which is perfectly legal in our DTD. I don't want to waste more internet
bandwidth sending the DTD, but you may get it by anonymous ftp from
iicm.tu-graz.ac.at in file pub/Hyper-G/sgml/hyper-g.dtd. There is also a
corresponding style sheet as well as styles used to convert HTF to HTML and
LateX with a stand-alone SGML parser. 

Let me say one final word about anchors: In my (and others) opinion, it is
generally not a good idea to store anchors (or even links) in documents. This
requires a modification of the document whenever an anchor is
inserted/modified/deleted and is problematic in multi-user environments with
private links, etc. Rather, the links should be stored and manipulated in a
seperate link database (like in Intermedia and also in Hyper-G). This also
allows for backwards-tracing of links, which is essential for maintaining the
integrity of the hypertext and providing a graphical overview of the hypertext
to the users.

However, in certain circumstances (like document modification) it is convenient
to supply the anchor information with the document. That is the reason why it's
in the DTD.

-----------------------------------------------------------------------------
Frank M. Kappe                                      fkappe@iicm.tu-graz.ac.at
Institute for Information Processing                     Fax: ++43 316 824394
Technical University Graz, Austria           "Sorry, no kangaroos in Austria"
-----------------------------------------------------------------------------

From sean@coombs.anu.edu.au  Thu Jun 25 17:27:39 1992
Return-Path: <sean@coombs.anu.edu.au>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA01526; Thu, 25 Jun 92 17:27:39 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA10252; Thu, 25 Jun 92 17:27:22 +0200
Received: by coombs.anu.edu.au (5.61/1.0)
	id AA13539; Fri, 26 Jun 92 01:27:14 +1000
Date: Fri, 26 Jun 92 01:27:14 +1000
From: sean@coombs.anu.edu.au (Sean Sebastian Batt)
Message-Id: <9206251527.AA13539@coombs.anu.edu.au>
To: www-talk@nxoc01.cern.ch
Subject: HTML -> Postscript? Text?

G'Day,

its late at night and my eyesight is fading so I'm simply not up to
parsing the installation instructions for variuos parts of WWW written
as they are in HTML.

Is there an HTML -> Postscript processor anywhere? HTML -> troff? Text
even?

Thanks in advance...

Sean
--
------------- Sean Sebastian Batt - sean@coombs.anu.edu.au --------
-------- Coombs Computing Section - Telephone: +61 6 249 3296 -----
-- Australian National University - GPO Box 4 Canberra City 2601 --
-------------------------------------------------------------------

From timbl  Thu Jun 25 22:50:05 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA02447; Thu, 25 Jun 92 22:50:05 MET DST
Date: Thu, 25 Jun 92 22:50:05 MET DST
From: timbl (Tim Berners-Lee)
Message-Id: <9206252050.AA02447@ nxoc01.cern.ch >
To: davis@willow.tc.cornell.edu
Subject: Re: Links that refer to a range of text, not just a point.
Cc: www-talk@nxoc01.cern.ch


On types of links: Link types can descibe

-- hints at presentation (Footnote, in-line, embed, automatic or
   on demand, print this if you print me, don't search this if
   you search me, etc etc)

-- semantics of the documents (a is a previous version of b, etc)

-- semantics of THAT DESCIBED by the document, eg
   "The W3 software" is a part of "the W3 project" where the
   'is part of' in fact applies to the unquoted things, not the
   documents. There is something on link types on the web in

   http://info.cern.ch/hypertext/WWW/DesignIssues/LinkTypes.html


On areas and points:  No, the WWW links are not (in general) points
they are areas. In the broad sense they can be any object within
the document, as identified by the anchor ID. In the specific
case of HTML, they are areas which have a beginning and an end.
In the case of the actual W3 software, noone can handle
overlapping anchors because the text object underneath isn't powerful
enough.  There is also a problem showing overlapping source anchors
(buttons) to the user. But in principle, there is no reason'
why one shouldn't have overlapping anchors, or at least
nested ones.

But not now.

Tim

From timbl  Fri Jun 26 15:35:05 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA04694; Fri, 26 Jun 92 15:35:05 MET DST
Date: Fri, 26 Jun 92 15:35:05 MET DST
From: timbl (Tim Berners-Lee)
Message-Id: <9206261335.AA04694@ nxoc01.cern.ch >
To: connolly@pixel.convex.com, timbl@nxoc01.cern.ch
Subject: Re: HTML DTD
Cc: www-talk@nxoc01.cern.ch

Dan, you say

<<
I suppose you could come up with a DTD that describes something
close to the current HTML, but I'm not sure of the value of it.
HTML allows tags to be pretty much sprinkled wherever you feel
like putting them. Any DTD that allows that much leeway just
looks like this:

        <!ENTITY % alltags "TITLE|H1|H2|H3|MENU|OL|UL">
        <!ELEMENT %alltags (%alltags)*>

i.e. every element is just a repeatable or-group of all the elements.
Then the SGML parser can't do any minimization cuz nothing's required. >>

Yes, current SGML currently is just a linear sequence of
elements. (Sorry, current HTML -- I'm typing this in serially
and can't edit!).  There is a reason for this:  it is very
convenient for HTML to map onto a series of styles -- for two
reasons.

Firstly, a lot of rich text objects can hold styles but can't hold
structure.  You can deduce structure from the styles -- like
Word deucing outlining from Heading styles, and WWW deducing
a list <UL> from a lot of <LI> paragraphs. But you can't go
very far.  If you want to make a HT editor out of such a
text object, you ahve to regenerate the elements from the
styles.

Secondly, it may be that the wysiwyg editors have a linear style
structure because that is intuitive to people. I don't know
a lot of people who use author/editor (which maintains
structure). Maybe real people actually think in terms of styles
and fix the document to look right, then they are happy to have the
structure deduced.

So if we went for a nestable HTML which would be cleaner for
those who apreciate recursion, we would have to have a hypertext
editor which made the structure visible.  I don't have experience
enough to know whether real information providers (group secretaries,
for example) would be into generating nested elements -- maybe
the styles are useful to keep as the current `user interface metaphor'
of word processors.

(It also makes making the editor easier!)

Or maybe we should have two levels of DTD -- one basically linear
and mandatory (and precompiled for fast access) and one more
sophisticated for larger documents.

Of course, when you are writing hypertext the large documents are
normally broken down into small bits to make traveing them quick.
So whereas each hypertext node may contain only H1 and H2 headings,
when a book is generated a la the_www_book.ps you get 5 levels
of heading from the whole tree.

So that is why the HTML strcuture is so simple. I am open to
a more sophisticated alternative.

Tim
____________________________
From connolly@pixel.convex.com Fri Jun 26 00:00:33 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA02722; Fri, 26 Jun 92 00:00:27 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA25540; Fri, 26 Jun 92 00:00:11 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA10700; Thu, 25 Jun 92 17:00:01 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA05209; Thu, 25 Jun 92 17:00:00 -0500
Message-Id: <9206252200.AA05209@pixel.convex.com>
To: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Subject: Re: HTML DTD 
In-Reply-To: Your message of "Thu, 25 Jun 92 23:07:25 +0700."
             <9206252107.AA02534@ nxoc01.cern.ch > 
Date: Thu, 25 Jun 92 16:59:59 CDT
From: Dan Connolly <connolly@pixel.convex.com>
Status: R


>thanks for that contribution.   Not being as hot on SGML
>as I ought to be, I don't see why the HREF has to refer to
>and entity declared separately rather than directly having
>a string argument.
>
That's actually left over from when I was trying to point
HREF attributes to MIME attachments. It's not really
necessary to move the UDIs into entities as long as you're
careful that the UDI syntax is a subset of the SGML
attribute literal syntax.

Beware, for example, that an
SGML parser will expand entity references in an attribute literal
to produce the CDATA for the attribute value. So that
<A HREF="A&P"> might be OK for the linemode browser,
but an SGML parser will try to resolve &P.

Also, SGML attribute values have a maximum length specified
in the SGML declaration. The default value is 960 or something
around there.

>The title is in fact optional currently, by the way ...
>we could keep it so though it "ought" always to have one.
>
>I'd like a DTD which as closely reflects the current HTML as
>possible.

I suppose you could come up with a DTD that describes something
close to the current HTML, but I'm not sure of the value of it.
HTML allows tags to be pretty much sprinkled wherever you feel
like putting them. Any DTD that allows that much leeway just
looks like this:

	<!ENTITY % alltags "TITLE|H1|H2|H3|MENU|OL|UL">
	<!ELEMENT %alltags (%alltags)*>

i.e. every element is just a repeatable or-group of all the elements.
Then the SGML parser can't do any minimization cuz nothing's required.

> Then, if we change HTML to HTML2, I would
>change it in a number of ways, in particular to include
>separate header and body parts.  I have come across the
>"Davenport" group of publishers who are defineing DTDs for
>technical documentation.  They include Steve Newcombe who
>is the HyTime guy (or one of the two I should say).
>I would like to get some input from them.
>

Certainly we should keep tabs on things like the Davenport
group and HyTime.

But my immediate concern is these little sytactic differences
that render HTML documents worthless to an SGML parser. The
current HTML and UDI syntax make a good proof of concept, but
we need to move toward formal definitions so that we can
have confidence that correct implementations will interoperate.

More later...

Dan


From timbl  Fri Jun 26 20:35:20 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA00771; Fri, 26 Jun 92 20:35:20 MET DST
Date: Fri, 26 Jun 92 20:35:20 MET DST
From: timbl (Tim Berners-Lee)
Message-Id: <9206261835.AA00771@ nxoc01.cern.ch >
To: sean@coombs.anu.edu.au, www-talk@nxoc01.cern.ch
Subject: Re: HTML -> Postscript? Text?

Sean,

You say, "its late at night and my eyesight is fading so I'm simply not up to
parsing the installation instructions for variuos parts of WWW written
as they are in HTML".

The basic installation instructions are also in the line mode
browser tar file in plain text.  That will allow you to get the
line mode browser (www) up.  9You can probably get a binary anyway).
Then you can use that to change HTML into text 

	www -n -p -na xxxx.html

will give you a non-interactive unpaged link-free formatted
text output. What more can you want? 

If there is something crucial missing from the .txtx files
shipped, then tell www-bug@info.cern.ch.  If you want more
information about other bits without getting www up, then
just telnet to info.cern.ch and read it on line.


Actually, we plan to make a combined user manual availabl
in postscript and TeX in the same way as the_www_book.ps.Z
dumped from the hypertext.  Later.


Tim

From timbl@zippy.lcs.mit.edu  Fri Jun 26 22:03:28 1992
Return-Path: <timbl@zippy.lcs.mit.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA01144; Fri, 26 Jun 92 22:03:28 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AB01707; Fri, 26 Jun 92 22:02:46 +0200
Received: by zippy.lcs.mit.edu 
	id AA29919; Fri, 26 Jun 92 16:04:23 -0400
Date: Fri, 26 Jun 92 16:04:23 -0400
From: timbl@zippy.lcs.mit.edu (Tim Berners-Lee)
Message-Id: <9206262004.AA29919@zippy.lcs.mit.edu>
To: ietf-udi@merit.edu, ietf@isi.edu, nir@nxoc01.cern.ch,
        www-talk@nxoc01.cern.ch
Subject: IETF BOF on Universal Document Identifiers
Cc: erik.huizer@surfnet.nl, mdavies@nri.reston.va.us



		Universal Document Identifiers UDI BOF

        Time:                  Monday 13th July 1992 1:30 pm

	Chair:                 Tim Berners-Lee

	Background document:   file://info.cern.ch/pub/www/doc/udi2.ps

Aim:  To define some clear decisions to be made by a small WG to be
formed.    A skeleton charter for such a working group is appended,
and may be discussed at the BOF.  The overall aim is to standardize
on one unifying printable representation for names and addresses of
retrievable objects in existing and future name and address spaces.

Those who have not been following the discussion are asked to read the
background document first.  An archive of some of the discussion is
available as file://info.cern.ch/pub/www/doc/udi/discussion.mbox .
The BOF will avoid philosophy and a discussion of the differences
between names and addresses, or the relative merits of different naming
schemes, or the combination of names in different spaces recommended to
refer to an object.  If new schemes are required, a separate effort
may be forked off to create them.
_________________________________________________________________

Provisional Charter:

	To define a printable string syntax to the allow

1.	The expression of the address on the network of any
	accesable object using existing information retrieval protocols;

2.	The expression of the name of any object held in
	a directory system or unique naming space on the network;

3.	The distinction to be made easily in the syntax
	between such protocols and directories and name spaces;

4.	New protocols, directories and naming schemes to be included as
	and when they are developed.


The aim of the working group is to further the exchange of information
between people and applications using the network by allowing an
unambiguous reference to be cited to accessible units of information.

The working group will build on experience with existing information retrieval
protocols and will not invent new protocols or name spaces. The working
group will note the properties of established name and address spaces.

Deliverable:

A	Document describing:
		Overall syntax, character sets and limitations
		Formats for addresses of:-
		   FTP files and directories
		   WAIS documents and databases
		   HTTP objects
		   NNTP newsgroups and articles
		   Gopher items.

	Standards track.

B	Appendices to document A for x500 distinguished name spaces

C	Appendices to document A for selected established
	unique identifier name spaces. (possible separate parallel
	working group).


Milestones:

	Define at BOF.


From serrano@osage.csc.ti.com  Mon Jun 29 23:19:44 1992
Return-Path: <serrano@osage.csc.ti.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA08231; Mon, 29 Jun 92 23:19:44 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA09219; Mon, 29 Jun 92 23:19:09 +0200
Received: from tilde.csc.ti.com ([128.247.160.56]) by ti.com with SMTP 
	(5.59/LAI-3.2) id AA16612; Mon, 29 Jun 92 16:09:38 CDT
Received: from osage.csc.ti.com (osage) by tilde.csc.ti.com id AA24636; Mon, 29 Jun 1992 16:07:53 -0500
Received: from localhost by osage.csc.ti.com (4.1/SMI-4.1)
	id AA28545; Mon, 29 Jun 92 16:07:51 CDT
Message-Id: <9206292107.AA28545@osage.csc.ti.com>
To: wei@xcf.berkeley.edu
Cc: www-talk@nxoc01.cern.ch
Date: Mon, 29 Jun 92 16:07:51 -0500
From: serrano@osage.csc.ti.com


Hi, 

1) I was wondering where I can get an editor to create hypertext
documents for the Viola WWW browser (html source editor)?

2) Where can I find "TheProject.html"? I checked under
/info.cern.ch/hypertext/WWW but there is no such path.

ANY HELP will be greatly appreciated.

Yonson
serrano@csc.ti.com
=====================================================

           ,_             ___@            __@
      ,______\@          _`__\-'         / /\__
  ^^^^,__'    \^^^      (')\/(`)       ___/\
              '                            /_
======================================================

From timbl  Tue Jun 30 00:08:31 1992
Return-Path: <timbl>
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA08352; Tue, 30 Jun 92 00:08:31 MET DST
Date: Tue, 30 Jun 92 00:08:31 MET DST
From: timbl (Tim Berners-Lee)
Message-Id: <9206292208.AA08352@ nxoc01.cern.ch >
To: serrano@osage.csc.ti.com, wei@xcf.berkeley.edu
Subject: Hypertext editor
Cc: www-talk@nxoc01.cern.ch


Well, this is one of theings we're all hoping is high on Pei's
agenda -- to make Viola an editor too.

Right now, all there is is a NeXTStep editor which was frozen last
summer so is not all up-to-date and bug-free but exists and
we use it but you need a NeXT. Available by aftp from
info.cern.ch in /pub/www/src

You ask, "Where can I find "TheProject.html"? I checked under
/info.cern.ch/hypertext/WWW but there is no such path."

I guess you tried the FTP server. Bad news: the ftp server and the
http server serve different things on that machine.
If you want the source, assuming you have the www line mode
browser, just say

 www -source  http://info.cern.ch/hypertext/WWW/TheProject.html > junk.html

This will give you an idea of what HTML looks like. Read the document
about what the tags mean, and then edit it to make your own.
For a limited quantity, a text editor on hypertext isn't too bad.
I'm away from home and so using emacs to make a hypertext trip report.
I make it with emacs then check it with  viola. A temporary solution
I know, till we get editors available.

Tim

From wei@xcf.berkeley.edu  Tue Jun 30 11:25:23 1992
Return-Path: <wei@xcf.berkeley.edu>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA09837; Tue, 30 Jun 92 11:25:23 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA01341; Tue, 30 Jun 92 11:24:46 +0200
Received: by xcf.Berkeley.EDU (5.65/XCF-1.34)
	id AA25967; Tue, 30 Jun 92 02:20:10 -0700
Date: Tue, 30 Jun 92 02:20:10 -0700
From: wei@xcf.berkeley.edu (Pei Y. Wei)
Message-Id: <9206300920.AA25967@xcf.Berkeley.EDU>
To: serrano@osage.csc.ti.com
Subject: Re:  Hypertext editor
Cc: timbl@nxoc01.cern.ch, wei@xcf.berkeley.edu, www-talk@nxoc01.cern.ch

Yup... an editor is in the works. The very current state of the viola-based
browser uses output from a standard SGML parser (sgmls), and the displaying
page is composed of multiple structured textfield/bitmap/whatever objects,
instead of the unstructured single textfield as in the currently released 
version. 

Basically I'm taking a crack at a rough 1-1 mapping of SGML document 
structure to viola objects. Allow editing in the objects (ie textfield, 
bitmap), and convert back to SGML. It'll probably be highly structured 
s.t. some people might find needing some getting-used-to. But as the 
document source is SGML, there should eventually be many alternative 
editors. I can't say when this version will be releasable, as there are 
about a zillion of things to do to get there.

In the meanwhile, however, I could release a version (HTML based) that 
displays bitmaps (XBM and XPM). I hesitate releasing that version because
it has some fundamental limitations (hence the remodeling), b/c XBM and 
XPM are very X-ish, and, and b/c it kinda affects WWW's relatively uniformed
interoperability (linemode browser may then have to say "a picture is here,
but..."). I don't know. Maybe it's better to freeze HTML now?


-Pei
			"All I know-- we've got to change what's happening
			 Something good could happen"	The B-52's


From jfg@dxcern.cern.ch  Tue Jun 30 11:30:38 1992
Return-Path: <jfg@dxcern.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA09853; Tue, 30 Jun 92 11:30:38 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA01640; Tue, 30 Jun 92 11:29:53 +0200
Received: by dxcern.cern.ch (5.57/Ultrix3.0-C)
	id AA11026; Tue, 30 Jun 92 11:29:57 +0200
Date: Tue, 30 Jun 92 11:29:57 +0200
From: jfg@dxcern.cern.ch (Jean Francois Groff)
Message-Id: <9206300929.AA11026@dxcern.cern.ch>
To: serrano@osage.csc.ti.com
Cc: www-talk@nxoc01.cern.ch
Subject: your WWW questions
References: <9206292107.AA28545@osage.csc.ti.com>

	Hello !

  Viola doesn't have an integrated editor yet. All you can do for now
is edit the html files separately and then display them with Viola.
Once you understand the few tags and the Doc IDs, it isn't hard to do.
Details are available in the hypertext documentation starting from the
"TheProject" page that you were looking for...

  ..."TheProject.html" is the hypertext page that you obtain when
clicking on the "?" icon in Viola, and then on the first link named
"World-Wide Web". If it doesn't work, it probably means that your
Internet connection has a problem, perhaps a bad configuration of the
name server. Try typing this in the "Doc ID" field at the bottom of Viola:

	http://128.141.201.74/hypertext/WWW/TheProject.html

and hit Return.

  Cheers !

--
  Jean-Francois Groff (jfg@info.cern.ch)
  World-Wide Web initiative
  CERN, ECP division, CH-1211 Geneva 23, Switzerland
  Phone +41 22 767 3755 -- Fax +41 22 767 7155
--
"Life may at times be boring, but is it more fun to be dead ?"
                                                  -- Alcor


From jfg@dxcern.cern.ch  Tue Jun 30 11:58:23 1992
Return-Path: <jfg@dxcern.cern.ch>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA09909; Tue, 30 Jun 92 11:58:23 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA04144; Tue, 30 Jun 92 11:57:48 +0200
Received: by dxcern.cern.ch (5.57/Ultrix3.0-C)
	id AA19493; Tue, 30 Jun 92 11:57:55 +0200
Date: Tue, 30 Jun 92 11:57:55 +0200
From: jfg@dxcern.cern.ch (Jean Francois Groff)
Message-Id: <9206300957.AA19493@dxcern.cern.ch>
To: davis@willow.tc.cornell.edu (Jim Davis)
Cc: www-talk@nxoc01.cern.ch
Subject: Re: Raisch's Attention link
References: <9206241602.AA13181@willow.tc.cornell.edu>

> As for charging, fairness requires that the link not be activated
> until I have seen a warning (otherwise I might get charged a zillion
> dollars to read the document - just like 900 phone numbers in the
> USA).  So this will add complexity to the client.

  This is of course very important! But just adding an alert in the
client before spending money is not a complex change. It might just be
annoying to the user (had you rather be annoyed or spend $$$ ?) The
user could also want to authorize access without warning to trusted
data sources, e.g. a pay-by-time database that he uses frequently
(with the stress of the ticking $ clock in the corner...)

> Also, attention links are not sufficient for a charging.  They
> support a model where I am charged once per read, no matter how much
> of the document I read.  But it seems likely that there might be need
> for other charging models.

  For a discussion of the many problems of a read-based charging scheme
(which wa