page 1  (4 pages)
2to next section

MEDIA : A Platform for the Commercialization

of Electronic Documents

Dimitri Konstantas, Jean-Henry Morin and Jan Vitek

University of Geneva
Centre Universitaire d?Informatique
24 rue General-Dufour
CH-1211 Geneva 4
Switzerland

tel: +41 22 705.76.64
fax: +41 22 705.77.80
e-mail: {dimitri,morin,jvitek}@cui.unige.ch

Abstract
MEDIA is platform that allows the commercialization and dissemination of electronic documents under similar conditions as printed documents, using an agentbased, distributed, and secure platform. Documents in the MEDIA system are encapsulated within agents which the reader must execute in order to access their contents. Thus, the document producer can include instructions that will ensure his authorship rights, this may take the form of an authorization password or of automated electronic payment.

In :

"Object Applications", ed. Dennis Tsichritzis, University of Geneva, Centre Universitaire d?Informatique, Aug. 1996.

This project is supported by the Swiss Federal Government by the SPP-ICS grants 5003-045332 (MEDIA), 5003-04533 (HyperNews), 5003-04534 (KryPict) and 5003-04535 (ASAP).

7

MEDIA : A Platform for the Commercialization

of Electronic Documents

Dimitri Konstantas, Jean-Henry Morin and Jan Vitek

Abstract

MEDIA is platform that allows the commercialization and dissemination of electronic documents under similar conditions as printed documents, using an agent-based, distributed, and secure platform. Documents in the MEDIA system are encapsulated within agents which the reader must execute in order to access their contents. Thus, the document producer can include instructions that will ensure his authorship rights, this may take the form of an authorization password or of automated electronic payment.

1. Introduction

The dominant medium of information distribution and dissemination are printed documents

such as letters, books, newspapers, articles and magazines. However, over the last few years the volume of documents exchanged for private communications, or available for public access in electronic form [1][2][3][4][5] has increased exponentially. The reason for this is dual. First, the majority of these documents, ranging from private letters to complete books, are directly created in electronic form using a computer editing system, and second, the wide availability of computer networks has allowed the faster and cheaper exchange of the electronic versions of the doc-

uments. Nevertheless, the overwhelming majority of documents exchanged over computer networks are of non-commercial and non-confidential nature. Commercial document distributors, like book editors, newspapers publishers, and legal organizations have shown a (justifiable) reluctance to distribute their documents in electronic form via the network.

Their prudence is well motivated by a number of reasons which become clear if we compare the traditional commercialization of printed documents with the commercialization of their electronic counter-part. There are three stages in the commercialization of documents: production, distribution and dissemination. Although production of electronic documents offers significant advantages over the traditional printing hard copy this hardly seems to be the case for the

distribution and dissemination stages. The distribution of electronic document is now catching up with traditional mail and courier services as far as reliability and security are concerned while he dissemination of electronic documents still exhibits tremendous disadvantage in the control of ownership rights [6][7][8].

The major advantage of the electronic versions of documents in the production stage is that they do not require printing machines. In addition, if the electronic document is to be distributed over the network, all reproduction is done by software at practically no cost and without any electronic reproduction equipment, like diskette or tape drivers.

The distribution of hard-copy documents has been long ago instituted and several legal and practical warranties exist concerning the distribution, reliability, confidentiality and integrity of the documents. It is only recently with the development of reliable networks, secure encryption

D. Konstantas, J-H. Morin and J. Vitek 8

[9] and data exchange algorithms [10] that public networks can provide the same warranties for the transfer of data. This way electronic documents can be safely and securely distributed over a public network without any major concerns regarding unauthorized access or tampering of the electronic document by the average network user. Clearly, however, someone with enough pro-

cessing power can break the network?s security; nevertheless this is comparable with traditional hard copy distribution where someone with enough money and/or authority can bypass the legal and practical security warranties.

Nevertheless the real commercialization problems begin when the electronic document has

reached its destination. With printed documents there are practical limitations to copying, modifying or re-distributing a document. For example it is very expensive, if not impossible, to modify the contents of a printed document without leaving traces, whereas copying, for example, a book is, in general, more expensive that buying a new copy. In addition even if the printed document is passed from one person to another there is still one single copy of it. Electronic documents on the other hand can be easily modified, copied, and distributed to many individuals

without the original owner loosing his own copy and without need to inform or even get authorization from the publisher of the document. Although techniques exist allowing one to control the integrity of a document and authenticate its publisher in an unambiguous way, the major problem relies in the ability to create and distribute unauthorized copies of the document with virtually no cost. This is eventually the major reason for which commercial intellectual work, like books, rarely appear in electronic form1.

The aim of the MEDIA (Mobile Electronic Documents with Interacting Agents) project is to develop the means that will allow the protection, commercialization and dissemination of electronic documents under similar conditions as printed documents and, in addition, offer the electronic document reader all the advantages of electronic information processing technology, like searching of documents depending on the readers interests, establishing links between re-

lated documents and interacting with the document [15].

2. Overview of the MEDIA Project

Traditionally electronic documents are seen as a collection of data that include the document

contents (text, images etc.) and possible simple or complex formatting instructions. The reader of the electronic document has all the software needed to display the data. The MEDIA project however takes a different direction and views electronic documents as programs that need to be interpreted in order to allow the visualization of its contents. The reader does not have a simple data displaying software but an interpreter for the language in which the electronic document is written. This way since the electronic document is a program that the reader must execute in or-

der to read it, the document producer can include instructions not only for defining the structure of the information and how it should be displayed, but also code to interact with the reader asking for example authorization passwords, verifying the integrity of its contents, decrypting sensitive parts of the document, allowing the reader to interrogate it and obtain basic information

1. With the exception of CD-ROM where sheer mass of data has proved an effective disincentive for illicit copying.

9 MEDIA

about its contents, and even sending messages through the network to the document publisher.

This type of programs are referred in the literature as agents [21].

The WWW [11] is without doubt the fastest growing electronic document hyper-network. Recently agent based WWW browsers and languages started appearing as extensions to HTML, like for example the SUN HotJava system [13] and Java language [12][14]. In the MEDIA project we use Java and HotJava as a basis for further developing and studying an agent system

that will support the basic requirements for the commercialization of electronic documents and namely

? secure interpretation of agents,

? integrity of the data (text and image) and of the agents, as well as transmission security,

? authentication and authorization of both sender and receiver,

? authorization control for manipulating (viewing, printing, copying, editing) documents,

? interactive manipulation of agents,

? tamper-proof encapsulation of the data in agents,

? software agents for searching, gathering and filtering information over the network.

The work in the MEDIA project is organized within three sub-projects, each with a well defined goal. The ASAP project addresses the basic issues of an appropriate agent platform for

the safe distribution of electronic data across the network, the HyperNews project focuses on a flexible and intuitive user environment for accessing and viewing the data, and the KryPict project addresses issues related with the electronic watermarking and management of visual data.

2.1 ASAP : Agent System Architecture and Platform

The ASAP project addresses key issues in the construction of distributed applications over vast

and dynamic networks of computers using mobile software agents [16]. Mobile software agents embody a new paradigm for the design and implementation of distributed applications. Each agent is a self-contained, autonomous entity, able to move around a network and interact with other agents as well as local services. The goal of ASAP is to deliver an execution platform that supports development of applications based on mobile software agents. This technology will be an infrastructure for developing commercial information systems in large scale, dynamic and

heterogeneous networks of computers [20].

The overall architecture of the agent system is depicted in Figure 1. A number of agents execute on the Agent Execution Platform (AEP). Each agent is composed of a set of objects. Agents may communicate with their environment, i.e. either other agents or platform services. The platform insulates the agents from the peculiarities of the operating system, hardware, windowing systems, local databases, and so forth. Other agent platforms may be accessible via a

transport service.

An agent execution platform (AEP) resembles in many aspects to an operating system. Like an operating system which is responsible for the management of computer resources used by dif-

D. Konstantas, J-H. Morin and J. Vitek 10

ferent threads or processes, an AEP has to manage the different resources necessary for the execution of agents. Also like an OS, an AEP must provide mechanisms for the interaction between different agents and for their protection. Other functionalities normally provided by OS

which will also be provided by the AEP are mobility support, networking of heterogeneous components, fault-tolerance, and so forth [17].

In the domain of electronic documents both, documents and user requests, will be represented by mobile agents. Thus, an electronic document, be it a user?s request for information or

a newspaper article, will be able to move around the network, knowing who is its rightful owner and what rights that owner has with respect to copying, displaying, and printing of the document. Electronic documents will be active (behavior being provided by the agent substrate) and thus will know, for instance, how to display themselves on different platforms [25].

2.2 HyperNews : Hypermedia Newspaper

The HyperNews project aims in the design and development of an application for the commercial dissemination of information. The target application is an electronic newspaper in which each news article is ?sold? independently, allowing the user to choose and pay only for the articles in which he is interested in. In order to allow the commercialization of news distribution, news articles will not be simple data but rather they will be encapsulated within agents. This way

every time the reader wishes to read an article he will have to engage the corresponding agent which will take the necessary actions for honouring the commercial requirements. For example, after checking the readers authorization or triggering the transfer of some electronic currency from the reader?s account to the news publisher, the agent will decode the encrypted data (news article) and present them to the user.

The Hypermedia Newspaper has three major components (Figure 2):

1. the information consumer side (Hypermedia Electronic Newspaper - HEN)

2. the information provider side (Electronic News Server - ENS)

3. the network linking the two sides

agent1 agenti

HOST ( Hardware / OS )

LOCAL SERVICES

AGENT

TRANSPORT SERVICE

EXECUTION
PLATFORM

Figure 1 Agent System Architecture.

11 MEDIA

On the client side, the HEN presents a personalized virtual newspaper with the relevant infor-

mation topics retrieved according to the reader?s information profile in a layout specified by the reader?s presentation profile. Both of these profiles represent a given context for the reader. Each

reader can have more than one context, according to his interests. For example while he is reading the newspaper at the office the context will be specified according to professional interests.

However, at home the context will be defined to reflect private interests, which can be quite dif-

ferent from professional interests (e.g. sports, cinema etc.). On the server side, the different information providers are independent from each other. The hyperlinks represent the accurate and

up-to-date information generated by the client request. The hyperlinks to the information archives represent the historical evolution of each piece of information if any is available.

The HEN will allow the reader to retrieve, read and pay only those news articles that inter-

est him from the different ENS that are available on the network. Related articles from different ENS?s can be connected with hyperlinks allowing the reader to easily retrieve and read all infor-

mation about a specific subject or event. In addition hyperlinks can be available to older articles of the same subject so that the evolution of the event can be easily traced. The ENS will classify

the news articles according to subjects, allowing the HEN to retrieve the news articles, set up the hyperlinks according to the readers interests and present the personalized version of the news-

paper to the reader.

The HyperNews system will, on one hand, allow the reader to reduce the time spent in searching for interesting information, while on the other hand, enable the information provider

to commercialize the electronic information in terms similar to those of a printed newspaper. The HyperNews newspaper will provide text and photographs, as printed editions do, and have

Figure 2 Global view of the Hypermedia Newspaper architecture

Information Archives

Virtual Newspaper

Information Profile

Presentation Profile

Context

Pa
n
n Pb Pn
l n

l
n
l
F

F

F
F l

n

n n
l l
F

F nF
n
Monday
Aug,7

Hypermedia Electronic
Newspaper
Electronic News Servers
Network

Th
e
Th

Topic F

Topic n

Topic l
Market Shares
Brand F
Brand I
1986 1987 1988 1989
25% 50% 75%
100%

Th

D. Konstantas, J-H. Morin and J. Vitek 12

the possibility to be extended with other possible types of multimedia information like video and audio clips, three dimensional graphics etc.

One of the most important issues of a printed newspaper is advertizement. Advertizement

provides an important revenue to the publishers of a newspaper and helps keep the price of a newspaper low. In the HyperNews newspaper advertizement can be easily incorporated in the news presentation, like for example displaying an advertizement or multiplexing, publicity spots within the articles. One can even anticipate active advertizement where the user can interact with the advertizement agent and get more information about the product or even order it. Furthermore advertizement can be associated with articles so that anyone interested in the specific arti-

cle will get one or more advertisements related with it. However the most important aspect of advertizement in the HyperNews newspaper is that we can have targeted advertizement based on the user?s information profile(s). That is, a user will receive advertizement that relates directly to his interests, like for example someone interested in sports and specifically in skiing, will receive advertizement for skiing resorts, ski equipment etc. This way advertizement will have a greater impact since it will allow the elimination of massive ?junk? advertizement into which,

in many cases, the really interesting items are lost.

2.3 KryPict: A software environment for copyrighting, authenticating, archiving and retrieving pictorial documents in multimedia databases

The main goal of the KryPict project is to develop copyright enforcement and document authentication methods that will allow to digitally watermark images. As a result, information provid-

ers will be able to make accessible their images over the network without having to fear that these images be stolen and that their copyrights be infringed.

In a manner similar to what paper mills have been doing for centuries, the idea is to encode a digital signature (digital watermark) within one?s own digitized pictorial document; the primary function of this watermark is to unequivocally identify the owner of the picture. The difficulty is here to find a watermarking procedure which will be resistant to a wide variety of treatments.

An added requirement could be that the watermark keeps track of some of the processing performed on the document. The two most typical applications of such digital watermarking could be the establishment of the original source of a picture, and the certification that a given picture has not been tampered with or doctored.

The KryPict environment develops mechanisms for the following functionalities:

1. Encyphering ownership information into pictorial documents. From a user viewpoint, the signature should be either an alphanumerical codeword, or a graphic (drawing, signature, trademark logo, etc.);

2. Providing means of knowing if an image has been modified;

3. same as 1, with the additional constraint that the encrypted information remains decipherable even if the document is modified. Typically, the image should retain its secret code/watermark even when copied by means of a screen capture, when compressed with a lossy algorithm, or if the watermarking algorithm is applied another time by another user on top of the original watermark.

13 MEDIA

3. Overall architecture of the MEDIA system

The MEDIA system is a flexible client-server architecture; clients are assumed to be personal

workstations running MEDIA client software to view and query for documents and images. Servers are more powerful computers which host databases of documents and images, running the MEDIA server software which allows them to process queries and package information as encrypted agents. The overall architecture of the system is shown in Figure 3.

Client-side:

User interface: The collective user interface of the MEDIA architecture comprises a collection of specialized interfaces integrated with an extended HotJava browser. The user interface provides functionalities for accessing the information, displaying the results of the search, and viewing the progress and status of query agents executing on remote machines.

? Image and watermark interface: allows to specify iconic criteria corresponding to the desired documents, to visualize and browse through the results of the search, and to vi-

sualize the digital watermarks embedded into the images.

? Text Viewer: Electronic Newspaper (HEN) reader allows retrieval of relevant article directly selected from different servers and automatically linked into dynamic hypertext documents.

? Other Viewers: Other available viewers (like for example, video, images and sound

viewers) may be integrated in the MEDIA user interface.

Agent Execution Platform: The agent execution platform support the secure execution and transport of agents. The client receives agents encapsulating digital documents, that is text and images, complete with ownership information (digital watermarks), access control and payment policies, as well as with behaviour (e.g. specialized decompression algorithms). The client can

create and send agents encapsulating information queries.

? Electronic payment: Commercial electronic payment technology will be integrated in the MEDIA system to establish a payment system for digital documents.

? Decryption: Existing decryption algorithms are used to decrypt agents and their contents. When agents contain images, these may possibly have embedded digital water-

marks.

Server-side:

Agent Execution Platform: The same platform runs on the server as well as on the client, except that it is used to send agents encapsulating documents and to execute incoming query agents.

Request processing: This module acts as interface between query agents and the information re-

trieval module.

Agent encapsulation: Documents and images are encapsulated as agents in a format agreed upon by all the MEDIA partners. This module contains code to perform this encapsulation.

D. Konstantas, J-H. Morin and J. Vitek 14

Agent Execution Platform (Java based)

Agent encapsulationRequest processing

HypertextImages

Indexing

User Interface (WWW Browser)

Image and Hypertext Other
Media

Agent Execution Platform
(Java based)

Encryption

Information retrieval module

Article retrieval

Index retriev.

Storage subsystems (with hyperlinks)

Image
storage sub.
population

Hypertext
storage sub.
population

Watermarks

Decryption

Electronic
Payment

Electronic

Bank

HyperLinks

to Digital Libraries

Figure 3 MEDIA architecture.

(Copyright)

Watermark

Image retriev. Hypertext

Client

Server Network

retriev.

Storage population

15 MEDIA

Encryption: The encryption module uses proven algorithms to encrypt agents and their contents before transport. In addition, when agents contain images, these may possibly have embedded digital watermarks.

Information retrieval module: The document retrieval requests will need to be analysed according to

? Index retrieval: Sets of documents referring to the same subject or belonging to the same

domain, will be able to be retrieved via the data base indexes.

? Image retrieval: Methods allowing access to images by means of descriptive iconic characteristics or using simple string searches operating on the image captions.

? Hypertext retrieval: Allow to retrieve documents by combining textual criteria.

Storage subsystem: It contains the databases for the images and the hypertext documents, as well as indexing structures allowing efficient access to the data. The documents are either images (binary data), image captions (short textual descriptions), or texts. Furthermore an index for the classification of the documents (text and images) according to subjects, domains etc. is included.

? Classified index: The index has pointers to documents and sets of documents in the im-

age and hypertext storage subsystems. The choice of the indexed domains and the method for the document selection depends on the specific applications and user requirements.

? Images: Images, stored as binary objects, and accompanied by the associated indexing structure reflecting iconic characteristics. If present, the image captions will also be stored into this sub-module.

? HyperText: The hypertext database contains the textual information. The format of the information will be defined according to the needs of the applications. The hyperlinks

can point both to other documents in the hypertext and image storage subsystems and to external information sources (like digital libraries).

Storage Population: A set of tools that allow the population of the server?s storage subsystem.

? Image storage subsystem population: This module allows to input into the system the provided images.

? Text storage subsystem population: Tools for the population the hypertext database with news articles and establishment of the basic hyperlinks between them.

Watermarks: This module allows the copyrighting of digital images by means of digital watermarks.

4. Conclusions

The major problem in the commercialization of electronic documents is the enforcement of the copyright and the collection of the corresponding fees. Traditionally this is done at the point of the distribution of the document. That is, the user provides a password or enables an electronic

D. Konstantas, J-H. Morin and J. Vitek 16

payment using credit card or other electronic payment means, and receives a copy of the document. The author relies on the goodwill of the user and on the penalties of the copyright laws,

that the electronic document is not copied zillions of times and distributed all over the world.

Nevertheless the circulating illegal copies are some times more than the legal ones. It is eventually for this reason that commercial electronic documents do not appear so often in the network.

The MEDIA system takes a different approach and enforces the copyright and collection of fees not at the point of distribution of the electronic document but at the point of consumption.

That is, the reader receives a document and only at the moment that he attempts to read it the copyright mechanisms are activated. If the user does not have or cannot obtain the required au-

thorization, via some kind of payment mechanism, he cannot view the document contents, which are encrypted. Nevertheless the reader can interact with the document and obtain some informa-

tion about it before deciding to pay the price, like the title of the document, its author and maybe a small abstract of its contents.

The MEDIA system is based on an agent platform and namely on Java, which is extended and enhanced with agent programming features and libraries. The functionality and features of

the system are demonstrated via an electronic newspaper application and digital image water-

marking subsystem.

The work for the MEDIA system is pursued in the context of three sub-projects ASAP, ad-

dressing the basic issues of an appropriate agent platform, HyperNews, focusing in the development of the target electronic newspaper application, and KryPict, addressing issues related to

electronic watermarking and management of visual data. The partners in the project are the University of Geneva (Object Systems Group and Multimedia Communications Group (D.

Tsichritzis, D. Konstantas), Computer vision and Theoretical computer science groups, (T. Pun, J. Rolim)); the r3 Security Engineering A.G., Aathal (A. Herrigel); the Basel Swiss Paper Mu-

seum (P. Tschudin); the Scientific photography laboratory, University of Basel, (R. Gschwind); and the Suisse Romande magazine l?Hebdo, Lausanne (B. Giussani). Once the system is com-

pleted it will be used by the commercial partners for the commercial distribution and dissemination of their documents in electronic form. We expect to have a first working prototype by

September 1996.

It should be noted that the area of electronic publishing is a relatively new one and not all aspects and dimensions of commercial electronic publishing and electronic documents are yet,

if at all, defined and understood. It is for this reason that in the MEDIA project we aimed in bringing together the expertise of diverse research and commercial partners. This way we will

be able to anticipate both the market trends and the technical and scientific advancements and achieve technically sound and commercially viable results. Nevertheless commercial electronic

publishing is a subject that needs to be pursued on a long term basis since many crucial aspects are not yet known, including what will be the reaction of the public, the authors and the publish-

ers, how the legal issues will be handled and, of course, what the technology will be available 3 years from now.

17 MEDIA

References

[1] Tages Anzeiger, http://www.tages-anzeiger.ch/

[2] The Electronic Telegraph, http://www.telegraph.co.uk/

[3] The New-York Times, http://nytimesfax.com/

[4] Time magazine, http://www.pathfinder.com/

[5] L?Hebdo, Electronic journal, http://www.hebdo.ch/

[6] Pamela Samuelson, "Legally Speaking: Copyright and DigitalLibraries", Communications of ACM, Vol. 38, No 4, April 1995

[7] J. Ebersole, ?Protecting intellectual property rights on the information superhighways?, International Publishers Association Bulletin, Volume X, No. 3, 1994, pp. 3-43.

[8] John S. Erickson, ?A Copyright Management System for Networked Interactive Multimedia?, proceedings of DAGS?95 Conference on Electronic Publishing and the Information Super Highway, May 30-June 2,1995, Boston.

[9] R. Rivest, A. Shamir, L. Adelman, ?A Method for Obtaining Digital Signatures and Public-Key Cryptosystems?, Communications of the ACM, volume 21, #2, February 1978, pp. 120-126

[10] Kipp E.B. Hickmann, ?The SSL Protocol?, http://home.netscape.com/newsref/std/SSL.html, and The SSL 3.0 Specification, http://home.netscape.com/eng/ssl3/index.html

[11] T. Berners-Lee, R. Cailliau, A. Luotonen, H. Frystyk Nielsen and A. Secret, ?The World Wide Web?, Communications of ACM, Vol. 37, No 8, August 1994, pp. 76-82

[12] Sun Microsystems, Inc.: ?The Java Language Specification?, Version 1.0 Beta

[13] Sun Microsystems, Inc.: ?The HotJava Browser: A White Paper?, Version 1.0 Alpha 3

[14] Netscape Communication Corp, ?The JavaScript Language?, http://home.netscape.com/eng/mozilla/2.0/ handbook/javascript/index.html

[15] Jean-HenryMorin and Dimitri Konstantas, ?Towards Hypermedia Electronic Publishing?, Proceedings of second IASTED/ISMM International Conference on Distributed Multimedia Systems and Applications, Stanford, California, August 7-9 1995.

[16] Fritz Hohl: Konzeption eines einfachen Agentsystems und Implementation eines Prototyps. Diplomarbeit Nr. 1267, University of Stuttgart, 1995.

[17] Anselm Lingau, Oswald Drobnik, ?An Infrastructure for Mobile Agents: Requirements and Architecture?,

Fachbereich Informatik (Telematik), Johann Wolfgang Goethe-Universitat?, Frankfurt am Main, Germany.

[18] Christian F. Tschudin, ?OO-Agents and Messengers?, Workshop on Object and Agents, ECOOP95, Aarhus, Danemark, 1995.

[19] N. Borenstein, N.T. Rose: ?MIME Extensions for Mail-Enabled Applications: Safe-TCL?, Draft, Bellcore, Dover Beach Consulting, September 1993.

[20] John K. Ousterhout: ?Scripts and Agents: The New Software High Ground?, Invited Talk at the 1995 Winter USENIX Conference, New Orleans, LA, January 1995.

[21] J.E. White: ?Telescript technology: the foundation for the electronic marketplace,? White Paper, General Magic Inc., 1994.

[22] Colin G. Harrison, David M. Chess, Aaron Kershenbaum: ?Mobile Agents: Are they a good idea??, IBM Research Report, T.J. Watson Research Center, Yorktown Heights, NY 10598, March 1995.

[23] Krishna A. Bharat, Luca Cardelli: ?Migratory Applications?, PLDI Conference, 1995

[24] Pattie Maes: Agents that Reduce Work and Information Overload. In CACM July 1994.

[25] Juerg Gutknecht: The Smart Document or How to Integrate Global Services. In F. Huber-Waeschle, H. Schauer and P. Widmayer (Eds.), GISI'95 Conference proceedings, p. 758-762. Springer 1995.

[26] The TACOMA (Tromso And COrnell Moving Agents) project. URL http://www.cs.uit.no/DOS/Tacoma/

19 MEDIA