\documentclass{mlcennote}
\usepackage{longtable}
\title{CEN MetaLex Workshop Proposal}
\author{Alexander Boer}
\author{Erik Hupkes}
\institute{University of Amsterdam\\The Netherlands}
\author{Fabio Vitali}
\author{Monica Palmirani}
\institute{University of Bologna\\Italy}
\author{Bal{\'a}zs R{\'a}tai}
\institute{Carneades\\Hungary}
\runningauthor{Alexander Boer}
\correspondingauthor{Alexander Boer}
\email{aboer@leibnizcenter.org}
\Leibnizreportdate{Aug 2008}
\newcommand{\axiom}[3]{$#1$ & $#2$ & $#3$ \\}
\newcommand{\fact}[1]{\multicolumn{3}{l}{$#1$} \\}
\newcommand{\concept}[1]{\textsf{\textsc{#1}}}
\newcommand{\property}[1]{\textsf{#1}}
\newcommand{\individual}[1]{\textit{#1}}
\newcommand{\variable}[1]{\textit{#1}}
\newtheorem{remark}{Remark}
\abstract{
The partial agreement of december 2006 was adopted by the workshop in the understanding that it will be augmented with additional agreements on ontological formalization, citation and reference, time and versioning, and components and component inclusion. This document proposes additional agreements (taking into account proposals from \cite{estrella3.2:07}), and clarifies terminology.
\textbf{Status:} This document is a \textbf{proposal} for the September 2008 meeting. It is the \emph{no remarks} version.
}
\begin{document}
\maketitle
\tableofcontents
\section{Introduction}
The partial agreement of december 2006 was adopted by the workshop in the understanding that it will be augmented with additional agreements on ontological formalization, citation and reference, time and versioning, and components and component inclusion.
The CEN MetaLex workshop requested more rigorous formalization of the four ontological levels (work, expression, manifestation, item) at which a bibliographic entity exists, and what properties belong to which level. The technical committee committed itself to providing a definitive list of properties of expressions that distinguishes version, variant, consolidation, original, translation, etc.
This document proposes additional agreements (taking into account proposals from \cite{estrella3.2:07,estrella3.3:08}), and clarifies terminology. Besides submitting this consolidated agreement, the technical committee has described most additions and clarifications separately in:
\small\begin{verbatim}
http://svn.metalex.eu/svn/MetaLexWS/documentation/2007proposal/CWA2007.pdf
\end{verbatim}
Intended appendices to this document are:
\begin{enumerate}
\item An XML Schema;
\item A DTD++ schema; and
\item An OWL schema.
\end{enumerate}
Check for availability of the appendices. The XML Schema and DTD++ schema are equivalent. Where they lead to ambiguity, it may be assumed that the DTD++ version best reflects the intentions of the technical committee of the workshop. The latest version of this document can be downloaded on:
\small\begin{verbatim}
http://svn.metalex.eu/svn/MetaLexWS/documentation/2008proposal/
\end{verbatim}
\normalsize Please regularly check for updates.
The latest version of schema files, existing translators and schema extensions, and examples are always found at:
\small\begin{verbatim}
http://svn.metalex.eu/svn/MetaLexWS/branches/latest/
\end{verbatim}
\subsection{Namespace and Namespace Qualification}
CEN MetaLex elements and attributes \textbf{must} be namespace qualified, even though they may be associated to a default namespace without prefix. The namespace of the current version of CEN MetaLex (1.0) regardless of the date of adoption, is:
\small{\begin{verbatim}
http://www.metalex.eu/metalex/1.0
\end{verbatim}}
XML elements, XML attributes, XML schema named complex types, etc, as well as OWL classes and OWL properties are declared in this namespace. Since ambiguity could arise in CEN MetaLex specification documents as to whether a name in this namespace refers to an XML Schema or OWL entity, we use the \emph{NCName} \texttt{metalex} for XML, and \texttt{metalex-owl} for OWL\footnote{XML elements and types, and OWL classes and properties are already distinguished by typographic conventions in CEN MetaLex. Note that OWL names only occur in MetaLex XML documents as the \emph{value} of XML attributes.}, i.e. it assumes the existence of two namespace declarations providing two different names for the \emph{same namespace}:
\small{\begin{verbatim}
xmlns:metalex="http://www.metalex.eu/metalex/1.0"
xmlns:metalex-owl="http://www.metalex.eu/metalex/1.0"
\end{verbatim}}
The OWL schema is published in conformance to the \emph{Best Practice Recipes for Publishing RDF Vocabularies}\footnote{http://www.w3.org/TR/swbp-vocab-pub/} guidelines as a hash namespace, recipe 3. This means that it can be imported with a \texttt{HTTP GET} request on the namespace under certain conditions.
\section{Design Requirements and Principles}
MetaLex XML positions itself as an interchange format, a lowest common denominator
for other standards, intended not to replace jurisdiction-specific standards and
vendor-specific formats in the publications process but to impose a standardized
view on legal documents for the purposes of software development at the consumer side.
The intention is to make it possible for existing XML documents, comforming to other XML
schemas, to conform to the MetaLex basic content models by defining the used elements as
\emph{implementations} of a MetaLex content model in a schema that extends the MetaLex
schema (see sections \ref{cm} and \ref{attributes}). The existing document does not have to be modified to achieve this.
This schema extension specifies the names of elements used and allows for additional attributes.
It is also used to further constrain the allowed content models if the schema extension is
intended to be \emph{normative}\footnote{For instance if the schema is used in an editor to
validate the structure of the document \emph{before} it is published. The MetaLex schema is on
the contrary intended to achieve a high degree of coverage of existing documents with all their
imperfections}.
\emph{Metadata} statements can be declared inside the XML document as RDF/A, or outside it as
RDF if it is not desirable to modify the XML document (see section \ref{meta}).
This however entails that every XML element in a MetaLex conformant document should declare
an \textbf{id} attribute to identify itself, so that it is possible to make metadata statements
about the \emph{expression} fragment embodied by the element.
Of central importance is the standardization of identification of documents (cf. section \ref{naming}).
The MetaLex standard distinguishes identity of documents on the item, manifestation, expression,
and work level (section \ref{biblio}), in accordance with \cite{Saur1998}. Every MetaLex document should declare at least
the URI of the document \emph{manifestation} it exemplifies using a metadata statement (i.e. a
triple with the format \texttt{\{this, exemplifies, URIRef\}}).
Self-identifying metadata of the manifestation, expression, and work is supplied either by conformance to a
\emph{naming convention} (cf. section \ref{scheme}), or by an explicit set of identifying metadata (section \ref{idmetadata}), stored inside or outside of
the document.
The target of citations (cf. section {citation}) is also identified using either the naming convention or a set of identifying
metadata. The target of citations is supplied either by using the \textbf{citations}
attribute group, or by explicit metadata.
The MetaLex schema also supports compounding documents by reference (cf. section \ref{inclusion}). The value of the \textbf{src}
attribute is interpreted as a reference that replaces a part of the MetaLex manifestation stored
in an external object identified by a URI. The target of inclusion by reference, a manifestation
component, is also identified using either the naming convention or a set of identifying metadata.
\subsection{Scope of the Agreement}
The \emph{CEN Workshop on an Open XML Interchange Format for Legal and Legislative Resources (MetaLex)}, declares, by way of its title, an interest in legal and legislative resources. This workshop agreement limits the applicability of the proposed XML standard to sources of law and references to sources of law.
As understood by the workshop, the source of law is a writing that can be, is, was, or presumably will be used to back an argument concerning the existence of a constitutive or institutional rule in a certain legal system, or, alternatively, a writing used by a competent legislator to communicate the existence of a constitutive or institutional rule to a certain group of addressees. Because the CEN Workshop is concerned only with an XML standard, it chooses not to appeal to other common ingredients of definitions of law that have no relevant counterpart in the information dimension.
Source of law is a familiar concept in law schools, and may be used to refer to both legislators (fonti delle leggi, sources des lois), legislation and case law (fonti del diritto, sources du droit), custom, etc. It should be noted that many romance languages make a distinction between the legislator as source of law, by way of speaking or writing, and the law as source of right(s), which is presumably what the existence of the law brings about.
In its broadest sense, the source of law is anything that can be conceived of as the originator of legal rules. In the context of MetaLex it strictly refers to communication in writing, and in a sense covers the \emph{fonti del diritto} in Italian and \emph{sources du droit} in French. There are two main categories of source of law in writing: legislation and case law.
The notion of a legislative resource includes legislation, and all writings produced by the legislator explaining and justifying legislation. The legislator is a legal person: it exists separately from any natural persons and organizations involved in the process of drafting and evaluating legislation. It is the formally correct completion of certain processes, usually dictated by law, that makes the legislator the formal author of a writing, and at the same time identifies the addressees to whom it applies. Obviously, the persons and organizations involved in the process of legislating may produce writings that are clearly precursors or legally required ingredients to the end product. These writings are also included in the notion of a legislative resource, but in this case it is not easy to give straightforward rules for deciding whether they are, or are not to be considered legislative resources. Different jurisdictions will have different theories on this subject.
\subsection{Separating Content and Metadata}
A guiding principle of the workshop is that identifiable structure of the document content is described by an XML Schema\footnote{\texttt{http://www.w3.org/XML/Schema}} and DTD++ schema, while metadata conforms to the Resource Description Framework (RDF) \footnote{\texttt{http://www.w3.org/RDF/}} and the associated schema language OWL\footnote{\texttt{http://www.w3.org/2007/OWL/wiki/OWL\_Working\_Group}}. The schema defines a number of meaning-neutral \emph{content models}, i.e. algebraic expressions of the elements and attributes that may (or must) be found in the content of the element. Metadata is used to describe the meaning and context of use of the text.
XML schema uses the same symbols –- uniform resource identifiers or URI -– as RDF. The fundamental difference is
that the URI is used in XML schema is used to attach identifiers to XML data structures in order for
software to refer to XML data structures, and in RDF to \emph{describe} the identified entities in a standardized format. Note that one \emph{could} conceive of XML documents as descriptions of the document, but conventionally the XML data structure referred to by a document URI is a manifestation of the document, and not a description of it.
The interface between metadata and the standardized XML manifestation consists of these shared URIs. The XML manifestation is what the URI refers to, and RDF descriptions describe the thing referred to by the URI.
\subsection{Bibliographic Entities}
\label{biblio}
The use of bibliographic terminology in the CEN MetaLex standard is inspired by the \emph{IFLA Functional Requirements for Bibliographic Records} (cf. \cite{Saur1998}):
\begin{itemize}
\item A \textbf{bibliographic object} is a bounded representation of a body of information,
designed with the intent to communicate, preserved in a form independent of a sender or receiver. A bibliographic work, expression, manifestation, and item are bibliographic objects.
\item A \textbf{bibliographic citation} is a representation of a bibliographic identifier of a bibliographic object, with the intent of referring to that bibliographic object. \emph{Article 1}, \emph{the first article} and \emph{the previous article} are examples of citation, and \emph{the Minister}, the \emph{President of the Republic}, \emph{the accused}, and \emph{We, Beatrix} are examples of references to other, interesting but non-bibliographic, things.
\item A \textbf{unique bibliographic identifier} identifies a bibliographic object uniquely. The \emph{uniform resource identifier} is used as a unique bibliographic identifier in MetaLex.
\item A \textbf{bibliographic work} is a bibliographic object, realized by one or more expressions, and created by one or more persons in a single creative process ending in a publication event. A work has an author or authors, and is the result of a publication event. We recognize the work through individual expressions of the work, but the work itself exists only in the commonality of \emph{content} between and among the various expressions of the work: it is an intentional object\footnote{I.e. it exists only as the object of one's thoughts and communication acts, and not as a physical object.}.
\item An \textbf{bibliographic expression} is a realization of one bibliographic work in the form of signs, words, sentences, paragraphs, etc. by the author of that work. Physical form aspects, as typeface or page-layout, are generally speaking excluded from the expression level. Any change in \emph{content} constitutes a gives rise to a new expression. If an expression is revised or modified, the resulting expression is considered to be a new expression, no matter how minor the modification may be. Expression is an intentional object.
\item A \textbf{bibliographic manifestation} embodies one expression of one bibliographic work. The boundaries between one manifestation and another are drawn on the basis of both content and physical form. When the production process involves changes in physical form the resulting product is considered a new manifestation. Thus, a specific XML representation, a PDF file (as generated by printing into PDF a specific Word file with a specific PDF distiller), a printed booklet, all represent different manifestations of the same expression of a work. Manifestation is an intentional object. A MetaLex XML document is a bibliographic manifestation.
\item A \textbf{bibliographic item} exemplifies one manifestation of one expression of one work: a specific copy of a book on a specific shelf in a library, a file stored on a computer in a specific location, etc. Items stored on a computer can be easily copied to another location, resulting in another item, but the same manifestation. This makes adding metadata about the item to the item in principle impossible. On the Internet generally speaking only the \emph{uniform resource locator} (URL) is an item-specific datum. An item is a physical object.
\end{itemize}
A MetaLex XML document is a standard manifestation of a bibliographic expression of a source of law. Editing the MetaLex XML markup and metadata of the XML document changes the manifestation of an expression. Changing the marked up text changes the expression embodied by the manifestation. Copying an example of the MetaLex XML document creates a new item.
Embedded metadata explicitly indicates its subject: if it is the document self-identifier (the empty string) \texttt{""} or a fragment identifier \texttt{"\#f"}composed of the document self-identifier and the value \texttt{f} of the \texttt{id} attribute of the intended target element, then the metadatum refers to the item (cf. section \ref{attributes}). The only item-level metadata property is however the \texttt{metalex-owl:exemplifies} property, which takes the manifestation-level identifier as value (cf. section \ref{naming}). The required type of subject of a metadata property is explicitly constrained in the OWL schema by a domain restriction.
XML attribute values by default pertain to the expression embodied by the manifestation, unless explicitly specified otherwise in this document.
Each bibliographic item exemplifies exactly one manifestion that embodies exactly one expression that realizes exactly one work. Because all these mappings are \emph{functional}, i.e. unambiguously maps to one entity, item identity can be, and often is, used as an indirect identifier of the other objects, similar to how, for instance, email addresses usually have a functional mapping to persons and can be used as an indirect identifier of persons. One can for instance refer to a work by referring to its initial expression in a context where a reference to a work is expected.
The inverse of these relations is however often not a function. One can think of the work as an abstraction of one or many expressions, the expression as an abstraction of one or many manifestations, the manifestation as an abstraction of one or many items. The manifestation, expression, and work are intentional objects whose existence is conditioned to the existence of at least one item, manifestation, expression, respectively. There is normally speaking no such thing as an expression that is not embodied, a work that is not realized, etc.
Besides the hierarchical constitutive relationships between the four levels, there are also horizontal relations between the objects within a level. The expressions of a work in the legal field are usually either the initially published expression, or expressions derived by content modification activity or translation activity. Manifestations of an expression are either the initially created one(s), or manifestations derived by editing activity. Items of a manifestation are either the initially created one(s), or copies of them.
The MetaLex OWL schema includes a number of event type definitions (cf. \cite{BoerEtAl2004}, and generally \cite{LagozeEtAl2000} on linking metadata to events) to make these horizontal relations explicit for the expression level. At the item level they cannot be embedded in a MetaLex item for obvious reasons.
Appropriate manifestation level version management methods and tools already exist (CVS, SVN, etc.). These are of no concern to the CEN MetaLex working group.
Most legislative events happen at the work and expression levels. Content-related events like markup, metadating, and digital signature happen at the manifestation level.
The CEN MetaLex standard aims to provide metadata for describing both the hierarchical and relational way of positioning and identifying bibliographic objects, at least at the work and expression level, at the relevant levels of granularity.
Some additional terminology is introduced to capture some bibliographic phenomena relevant to law, as well as operational criteria for recognizing these in a MetaLex XML document. These classifications should be considered metadata, and their specifications are found in the MetaLex OWL schema:
\begin{itemize}
\item A \textbf{bibliographic source of law} is a bibliographic object that can be, is, was, or presumably will be referred to, by way of bibliographic citation, to back an argument claiming the existence of a legal rule in a certain legal system, or, alternatively, a bibliographic object published or realized by a competent legislator to communicate a legal rule to a certain group of addressees. Both the legislator and the user of the bibliographic source of law understand it as a medium used for communicating the existence of legal rules, including auxiliary declarations required for the proper understanding of legal rules, between legislator and user.
\item An \textbf{initial version} of a bibliographic work is the expression that realizes the work at the time of its official release in the public domain as a bibliographic object. It is the \textbf{metalex-owl:result} of an \textbf{metalex-owl:InitialBibliographicCreation}.
\item A \textbf{version} of a bibliographic work is either the initial version of the work, or an expression realized by modification of a version. It is the \textbf{metalex-owl:result} of a \textbf{metalex-owl:BibliographicModification} of the (\textbf{metalex-owl:matter}) previous version.
\item A \textbf{version in force} of a bibliographic work is a version that is, was, or will be \emph{in force} during a specific time interval. The in force time intervals of versions in force of the same work do not appear to overlap in time \emph{when viewed from any specific vantage point in time}. Note however the possibility of \emph{virtual} (section \ref{virtual}) and \emph{ex tunc} (section \ref{idmetadata} and \ref{extunc}) expressions, which may create a substantial difference between vantage points in time.
\item A \textbf{version ex tunc} of a bibliographic work is an expression that riverberates back in time up to a specific date the effects in terms of content, force and efficacy of a constitutive event on a specific version of the work. The version ex tunc is thus, \emph{as it were}, generated by a fictional event past in time that allows rewriting institutional history, for instance through errata corrige, or at the occasion of an annulment of a modification made in the past by a constitutional court. The fictional event of such an expression is treated \emph{as if} it were a true event \emph{after} the constitutive event happened. Institutional history therefore appears different depending on which vantage point in time one takes. A version ex tunc is the \textbf{metalex-owl:result} of a \textbf{metalex-owl:FictionalExpressionCreation}, which is a
temporal fiction of (\textbf{metalex-owl:\\temporalFictionOf}) some other event -- the constitutive event -- that happened \emph{after} the fictional event.
\item A \textbf{consolidation} of a bibliographic source of law is a version realized by the execution of legal rules found in another bibliographic source of law to the previous version. It is the \textbf{metalex-owl:result} of a \textbf{metalex-owl:LegislativeModification} of the (\textbf{metalex-owl:matter}), which is the previous version, by the (\textbf{metalex-owl:instrument}), which is the other bibliographic source of law stipulating the modification. This is often when it enters into force, although the modification may of course be conditional upon some other event.
\item A \textbf{variant} of a version in force of a bibliographic work is an expression that shares its \emph{in force} time interval. Although the concept variant is often nominalized, it is a symmetric relationship (\textbf{metalex-owl:variant}) between two expressions.
\item A \textbf{language variant} of an expression of a bibliographic work is an expression that shares its in force time interval, and differs in nothing but language. For instance, the English, Dutch, Italian, and German versions of a European directive are different language variants. It is also a symmetric relationship (derived on \textbf{metalex-owl:variant}). In a MetaLex manifestation of the expression this is expressed in different values of the \textbf{xml:lang} attribute.
\item A \textbf{translation} of an expression of a bibliographic work is an expression that shares its in force time interval, differs in nothing but language, and has been realized by way of translation of one expression into another expression. Translation is an asymmetric relation between bibliographic expressions, expressed by a \textbf{metalex-owl:Translation} event, which has a \textbf{metalex-owl:translator}, taking the initial expression as a \textbf{metalex-owl:matter}, and the translated document as \textbf{metalex-owl:result}. The translation should not be confused with language variant: while language variants can be realized concurrently by the legislator, and are equally authoritive if they are, the translation of an expression is generally speaking less authoritive than the expression it is a translation of, even if \emph{officially} translated.
\end{itemize}
\subsection{Content Models instead of Elements}
\label{cm}
A MetaLex XML element is characterized by a name, a content model, and zero or more attributes.
According to the philosophy of descriptive markup (cf. the Text Encoding Initiative\footnote{http://www.tei-c.org/P4X/SG.html}), the name of an XML element is usually semantically-charged (i.e. it provides a hint as to the meaning of the text fragment, or its role within the whole of the document). Additional information about the content of the element goes into attributes. The \emph{content model} (cf. \cite{VitaliEtAl2005}) is an algebraic expression of the elements that may (or must) be found in the content of the element. Generic elements, on the other hand, are named after the content model: they are merely a label identifying the kind of content model.
All XML vocabularies contain a mix of descriptive and generic elements, and, depending on the foreseen uses of the documents, emphasize one of the approaches. For instance, vocabularies with precise procedural semantics (e.g. XSLT, SVG) do not depend on generic elements, while vocabularies intended for diverse content (for instance XHTML) employ generic elements. Consider for instance that in XHTML 2.0 both \texttt{a} and \texttt{img} elements are being replaced or phased out in favour of generic substitutes using attributes.
The most important validation languages (e.g. XML Schema) do not allow validation rules to be associated to attribute values, so element names are currently the only way to associate validation rules to documents. This is a cause of pollution of principles, forcing semantically-charged elements to assume a rigid content model, while generic elements take care of odd situations that where not foreseen when the content models where designed.
Legislative drafting technique has a long tradition, and often its own standards of what legislative documents should look like. This makes descriptive markup combined with strict content models very tempting. On the other hand, there are so many exceptions that can be found in concrete examples we sometimes just want to give up on precise description altogether and resort to generic elements, in particular because there should be not one \emph{iota} of difference between the original expression of the legislator and the XML manifestation of that expression.
The approach of the workshop is to provide for a complete and automatic interchangeability of approaches, from generic to descriptive and vice versa.
Sharing content models is achieved by using two special attributes, \texttt{name} and \texttt{type} that provide information about the meaning and the content model of the element. If one of these values is missing, the name of the element supplies the missing value. The following elements are for instance equivalent from the point of view of the standard:
\footnotesize
\begin{verbatim}
\end{verbatim}
\normalsize
\section{Attributes and Content Models}
\label{attributes}
The \textbf{metalex:rootType} content model, and its associated types and elements, disappear from the schema: instead the \textbf{metalex:containerType} content model is used. The \textbf{metalex:containerType} content model permits the use of an \textbf{metalex:mcontainerType} element as initial element.
The optional metadata attribute \textbf{metalex:showAs} \textbf{must} contain a human readable label of the \emph{whole} metadata statement, i.e. of its subject, property and object.
The \textbf{metalex:src} (URI value) attribute is used to include a manifestation component by \emph{reference}.
Attributes from any other (\textbf{\#other}) namespace \textbf{may} be used on any MetaLex element.
The \textbf{xml:lang}\footnote{Used in conformance with \texttt{http://www.w3.org/TR/REC-xml/} and IETF RFC 3066.} \textbf{must} be recoverable from the context of the element.
Every MetaLex element \textbf{must} have an \textbf{id} attribute. Use \textbf{metalex:id} if no \textbf{id} already exists.
The \textbf{metalex:naming} attribute (value: a URI) is used to signal conformance to the section \ref{scheme} naming convention for URIs. It \textbf{must} be recoverable from the context of the element.
The \textbf{metalex:LocalNamingConventionMethod} (values: individual, ordinal, positional) and \textbf{LocalNamingConventionScope} (values: parent, root, or name of ancestor element relative to which identifiers like \emph{article 1} are unique) attributes are used to signal conformance to a local naming convention for structural parts of a document. They \textbf{must} be recoverable from the context of the element.
The \textbf{Quoted} content models have been added: in some cases -- in particular in modifying acts -- a source of law comforming to a certain content model \emph{quotes} other structural blocks that violate against the content model.
\section{Metadata}
\label{meta}
MetaLex uses the conventions of RDF/A processing for embedding RDF metadata statements inside MetaLex XML. Consult \texttt{http://www.w3.org/TR/rdf-primer/} for more information about RDF and \texttt{http://www.w3.org/TR/xhtml-rdfa-primer/} for RDF/A. In any case of ambiguity this specification takes precedence over RDF/A specifications.
The purpose of MetaLex embedded metadata is nothing more than storage of RDF formatted metadata in MetaLex XML. An RDF description of a resource consists of a set of statements. The MetaLex standard includes an OWL schema that specifies commonly required properties and classes in RDF statements about legal and legislative resources. This schema may be used with RDF stored outside the document in question, and the embedded metadata processing mechanism may be used with other metadata schemas like Dublin Core or PRISM.
The main difference between storage inside and outside the standard XML manifestation is the identification of the metadata author: the metadata inside the document is associated to the editor of the manifestation, who must be presumed to be the author of the metadata. An RDF statement has the following components:
\begin{description}
\item[subject:] the thing the statement describes;
\item[predicate:] a specific property;
\item[object:] the thing the statement says is the value of the property, for the thing the statement describes.
\end{description}
The subject and the property value are always URIs. The object is either a URI or a (optionally datatyped) literal.
See the RDF specifications at \texttt{http://www.w3.org/2001/sw/RDFCore/} for details on RDF literals and datatyping of literals.
A MetaLex document \textbf{must} declare what it is a manifestation of, as follows:
\small\begin{verbatim}
\end{verbatim}
Other metadata \emph{may} be embedded.
\subsection{Embedded Metadata Processing}
RDF/A statements \emph{may} be added to any MetaLex element if the content model allows it. Elements derived from the \textbf{metalex:urMetaType} type \textbf{must} contain RDF/A attributes expressing an RDF statement. Relative URI references in RDF/A attributes are relative to the \emph{xml base} of the containing element.
An RDF/A element is any XML element that contains either the attribute \textbf{property}, \textbf{rel}, or \textbf{rev}. Exactly one RDF statement is generated per \textbf{rel} (relation), \textbf{property}, or \textbf{rev} (reverse) attribute by an RDF/A processor: the attribute indicates a new statement whose \emph{predicate} is the URI value of that attribute. In the case of \textbf{rel} and \textbf{property}, the subject of the statement is decided by \emph{subject resolution}.
In the case of \textbf{rel}, the object is decided by \emph{URI reference object resolution}. In the case of \textbf{property}, the object is decided by \emph{literal object resolution}. In the case of \textbf{rev}, the subject of the triple is decided by \emph{URI reference object resolution} and the object of the triple is decided by \emph{subject resolution}. If both \textbf{rel} and \textbf{rev} attributes are used within the same element, two RDF statements are generated.
Literal object resolution yields either the value of the \textbf{content} attribute or, if it is absent, the element content. The value of the content attribute is by default interpreted as a plain literal. The element content is by default interpreted as an XML literal.
The \textbf{datatype} attribute is used to specify a specific XML Schema datatype \\(cf. \texttt{http://www.w3.org/TR/xmlschema11-2/}). If it is present, the value yielded by literal object resolution is interpreted as an instance of the XML Schema datatype.
URI reference object resolution yields either the URI value of the \textbf{resource} attribute or, if absent, the \textbf{href} or \textbf{src} attribute. The \textbf{resource} attribute is only used to specifically communicate that the URI is not intended to be ``clickable'', or if a \textbf{href} or \textbf{src} attribute is already present on the element for other reasons and does not refer to the intended object. It is strongly advised to use the \textbf{href} attribute whenever reasonable.
Subject resolution usually yields the URI value of the \textbf{about} attribute, or, if the RDFa element that includes the predicate attribute does not have an \textbf{about} attribute, the \textbf{about} attribute of the first ancestor element that has an \textbf{about} attribute. In the absence of an \textbf{about} attribute within scope, it yields the \emph{xml base} of the element.
\subsection{OWL Schema}
The OWL schema can be found at \texttt{http://www.metalex.eu/metalex/YYYY-MM-DD}, where \texttt{YYYY-MM-DD} is the date of the agreement, provided that one correctly sets the mime type, i.e:
\begin{verbatim}
GET http://www.metalex.eu/metalex/YYYY-MM-DD
Accept: application/rdf+xml
\end{verbatim}
A description logic syntax (cf. \cite{horrocks03reducing,baader01overview}) rendering is found in appendix \ref{owl}. The OWL schema specifies the concepts mentioned in section \ref{biblio}, the identifying metadata specified in section \ref{self} and \ref{idmetadata}, and the citation metadata specified in section \ref{citation}, as well as some related auxiliary concepts.
\section{Naming}
\label{naming}
In MetaLex, bibliographic entities are identified with URIs or URI references. Identification of bibliographic entities plays a role in:
\begin{enumerate}
\item Self-identification of documents;
\item Citation of other documents;
\item Inclusion of document components.
\end{enumerate}
Note that URI references that are \emph{relative} can resolve to different URIs dependent on XML base processing context.
\subsection{Uniform Resource Identifiers}
Each bibliographic \emph{item} encoded in the CEN MetaLex standard \textbf{must} have at least one URI. Manifestions \textbf{must} refer to the item URI by the URI reference \texttt{""} (i.e. the empty string URI reference\footnote{Note that URI, which is absolute, and URI reference (cf. IETF 3986), which is absolute or relative, and can therefore be empty, are different. URI are globally unique, but URI references are not: only after resolution to a URI they are globally unique.}). It is in principle not possible to encode item level information in the manifestation. It \textbf{must} be possible to establish the \emph{xml base} of an item, in conformance with the XML Base specification\footnote{http://www.w3.org/TR/REC-xml/} and IETF RFC 3986 (or 2396). The concatenation of the established \emph{xml base} and the \textbf{id} attribute of an element \textbf{must} result in a valid URI reference for the element, conformant to the addressing recommendations of W3C, which counts as a bibliographic identifier of the element as a bibliographic \emph{item}. The are no further restrictions on item identification.
Note that if one uses an explicit \textbf{xml:base} attribute it remains the same after copying the document, which means that it \emph{also} behaves as a manifestation level identifier. There are legitimate use cases of the \textbf{xml:base}, where it is inserted as a temporary identifier to an XML subtree in an XML processing pipeline.
The manifestation, expression, and work \textbf{must} also have at least one URI, which counts as their manifestation, expression, work level base, respectively.
\subsection{Fragment Identifiers}
Every MetaLex element \textbf{must} have an \textbf{id} attribute, not necessarily the \textbf{metalex:id} attribute. The \textbf{id} value of an element is a manifestation fragment identifier. Concatenated to the manifestation level document identifier it globally identifies the element at the manifestation level, concatenated to the expression level document identifier it globally identifies the expression embodied by the content of the element, concatenated to the work level document identifier it presumably globally identifies a structural element common to various expressions of the work.
\subsection{Signalling Identification Method Conformance}
There are two different methods for uncovering the relationship between manifestation, expression, and work, which have a direct impact on naming and addressing:
\begin{enumerate}
\item An \textbf{explicit encoding} of the relation between these URIs, and/or the class of the object designated by the URI, as metadata.
\item A \textbf{naming convention} for manifestations, expressions, and works that establishes a systematic relationship between transparent URIs, as proposed in the naming convention appendix, which allows one to derive the URI of one from the other.
\end{enumerate}
The attribute \textbf{metalex:naming} declares whether the manifestation respects the naming convention or uses some other user community naming schema. Similar to \textbf{xml:base} and \textbf{xml:lang}, the value of \textbf{metalex:naming} applies to the content of the element (usually the root) that carries the attribute, except when overridden by other \textbf{metalex:naming} attributes. The value of \textbf{metalex:naming} is a URI. The standard defines one naming scheme: \textbf{metalex-owl:MetaLexNamingScheme}, which is the proposed naming scheme of section \ref{scheme}.
The XML document metadata \textbf{must} declare the manifestation URI to identify itself. If the manifestation URI does not conform to the MetaLex naming convention it must declare the manifestation, expression, and work URI. In addition it must declare \emph{identifying} metadata from which a conforming \textbf{metalex-owl:MetaLexNamingScheme} name can be constructed, being for instance the country, document type, date of creation of the work, an optional sequence number on that date, language of the expression, date of creation of the expression, and date of creation of the manifestation. Identifying metadata is specified in section \ref{idmetadata}.
Identifying metadata for conforming documents can be automatically transformed into naming scheme conformant identifiers and vice versa.
\subsection{Manifestation Self-identification with Metadata}
\label{self}
Only a manifestation identifier \textbf{must} be declared with \textbf{metalex-owl:exemplifies}. The expression and the work must be declared in case of non-conformance to the naming convention. Noting that the URI reference \texttt{about=""} refers to the document itself, the following declares a standard manifestation, expression, and work base (using the naming convention):
\small\begin{verbatim}
\end{verbatim}
The RDF reading of \textbf{m1} is as follows: \textbf{m1} is a statement that states that the (referent of) \texttt{metalex:exemplifies} of (the referent of) \texttt{(empty string)} is (the referent of) \texttt{/tv/act/2004-02-13/2/tv}. An alternative, simpler form that is also permitted is the following:
\small\begin{verbatim}
\end{verbatim}
The semantics of \textbf{metalex-owl:thisEmbodies} and \textbf{metalex-owl:thisRealizes} are as follows, assuming \texttt{{p, q, r, s}} is a set of URI references\footnote{This is an instance of property chaining (cf. OWL 2 specifications: \texttt{http://www.w3.org/TR/2008/WD-owl2-primer-20080411/})}:
\begin{enumerate}
\item If \texttt{p metalex-owl:exemplifies q} and \texttt{q metalex-owl:embodies r} \\then \texttt{p metalex-owl:thisEmbodies r}
\item If \texttt{p metalex-owl:exemplifies q} and \texttt{q metalex-owl:embodies r} and \\\texttt{r metalex-owl:realizes s} then \texttt{p metalex-owl:thisRealizes s}.
\end{enumerate}
Read for \texttt{meta} in the examples above any appropriate element that permits metadata attributes. The URIs are relative, in this case conforming to the naming convention: the base is set by the processing environment. This means that the \emph{mURI} of the naming convention describes a URI \emph{reference} that potentially resolves to large set of URIs for each bibliographic object: one for each processing environment that sets its own base.
Note that if one wants to conform to the naming convention but also want to embed another identifier, this is trivially achieved with a metadata statement of that impact, for instance:
\small\begin{verbatim}
\end{verbatim}
\subsection{Identifying Metadata}
\label{idmetadata}
If the naming convention is not used, a set of metadata \textbf{must} be available, either in the form of RDF/A statements, or in the form of RDF. This set is based on section \ref{scheme}. Let \texttt{[ManifURI]} be the identifier of the manifestation, \texttt{[ExpURI]} be the identifier of the Expression, \texttt{[WorkURI]} be the identifier of the work, all declared as in the previous section. Datatyped values \texttt{}, etc. are values conforming to the XML Schema datatype \texttt{xsd:date}, etc. The relevant set of metadata for the work is the following:
\small\begin{verbatim}
\end{verbatim}
The main purpose of the identifying set is to create a non-ambiguous sequence of work creation events for each work class. Work class is identified by a descriptive string as specified in section \ref{scheme} (e.g. "act" for act of parliament, or "municipal;foobartown;decree"). It is recommended to identify work classes with specific publication channels that can be monitored. Work classes are specific to countries, identified by the two or three characters from the ISO 3166-1 standard. The combination of the date of the event and the issue number (some string) uniquely positions the work in the temporal sequence of works belonging to that work class.
The relevant set of metadata for the expression is the following:
\small\begin{verbatim}
\end{verbatim}
The \texttt{[CreationEventDateURI]} \textbf{may} and will often coincide with the creation of the work in unversioned documents. In case of an ex tunc modification (see section \ref{extunc}) the creation event is a \texttt{FictionalExpressionCreation}, and it is a temporal fiction of some other event, to be filled in, which happened at another relevant date:
\small\begin{verbatim}
\end{verbatim}
The relevant metadatum for the manifestation is the following:
\small\begin{verbatim}
\end{verbatim}
This metadatum \textbf{may} be encoded in the XML manifestation as metadata, but it is assumed that extraction of RDF metadata from a standard MetaLex manifestation already provides the information that the file type is \texttt{xml}.
The OWL schema provided more more identifying metadata that \textbf{may} be added.
\section{Relative URI Naming Convention}
\label{scheme}
The alternative to explicit processing of the relation between the bibliographic objects the item represents, is depending on a systematic relationship between transparent URIs. This is the \textbf{mURI} naming convention for the construction of these transparent URIs. The URIs \texttt{http://gov.tv/tv/act/2004-02-13/2} and \texttt{http://gov.tv/tv/act/2004-02-13/2/tv@} are examples of its use. \texttt{/tv@} adds the information, appended to the work URI, that the reference is to the Tuvaluan language variant, original version as published.
Note that the naming convention is used both for self-identification and for citation. A self-identifying name never depends on the moment of dereferencing it: some names that are proper for use in citation should not be used for self-identification.
%\texttt{http://svn.leibnizcenter.org/svn/MetaLexWS/documentation/2007proposal/namingsection.pdf}
%\section{Naming Convention for Interoperability}
\subsection{Scope}
The scope of this naming convention is not to set a fixed way to construct URIs, but to define a minimal data set in the metadata (meta Unique$\_$ResourceIdentifier - \textit{mURI}). This \textit{mURI}) can be used as the actual URI after an XSLT/CSS transformation or a resolution mechanism (software) and may be managed by the author of the legal information resource or the editor.
Some principles and characteristics should be respected in the naming convention:
\begin{enumerate}
\item it is a significant and logical description of the resource and not of its physical path;
\item it \textbf{must} be permanent and stable over time;
\item it \textbf{must} derive from invariant properties of the resource so as to provide some degree of certainty in obtaining the same name for the same resource regardless of process, tool and person.
\end{enumerate}
\textit{mURIs} are used in numerous situations. In each case it is important to use the \textit{mURIs} for the correct level of the document. We introduce here a few example use cases:
\begin{itemize}
\item Legislative citations will usually refer to a work.
\item The list of attachments and schedules may vary with specific expressions, so references to expression components are specific for the expression level.
\item The specific manifestation that is the CEN MetaLex XML format uses an XML-based syntax to refer to expression components, and associates them to the corresponding manifestation components containing the appropriate content. Therefore within XML files the URI of the manifestation components must be used to refer to attachments and schedules. When referring to the main document, the referring URI must contain the string ``main" to point to the main document.
\item Multimedia fragments within an XML manifestation (e.g., a drawing, a schema, a map, etc.) do not exist as independent expression components, as they are only a part of the expression component (even when they are the only part). In fact they are only manifestation components, therefore referred to in \textbf{object} and \textbf{img} elements with the appropriate manifestation component URI. If a multimedia fragment is referred by two different manifestation components of the same manifestation or of a different manifestation/expression/work, the fragment is duplicated.
\end{itemize}
\subsection{Absolute and relative \textit{mURI}}
A \textit{mURI} can be absolute or relative. An absolute form of a \textit{mURI} is a complete set of metadata that identifies a complete path for pointing out a specific resource. A relative form of the same \textit{mURI} is a partial set of metadata that can only identify the manifestation itself based on a certain context. These are useful to complete several \textit{mURIs}.
In particular, we can observe two possible uses of relative \textit{mURI}:
\begin{itemize}
\item References at the \textit{work} and \textit{expression} and \textit{manifestation} level need to be specified as relative \textit{mURI} grounded on the top level.
\item References at the expression and manifestation level need to be specified as relative mURI grounded on the work level.
\end{itemize}
In XML manifestations of CEN MetaLex documents, mURIs \textbf{must} be expressed in relative form, grounded at the root level of the URI.
\subsection{The \textit{mURI} of the Work}
The \textit{mURI} for the \textit{work} is the baseline for building the \textit{mURI} for the \textit{expression}, which is the baseline for the \textit{mURI} of the \textit{manifestation}.
The \textit{mURI} for the \textit{work} consists of the following pieces:
\begin{enumerate}
\item Country code (a two or three-letter code according to ISO 3166-1);
\item Type of work;
\item Date of release into the public domain as a bibliographically identifiable work (expressed in YYYY-MM-DD format, but for a type of document where the year is enough for unique identification the syntax is YYYY); and
\item Optionally an issue number representing the sequence of releases on the relevant date when needed for disambiguation.
\end{enumerate}
All components are separated by forward slashes (``/'') so as to exploit relative URIs in references. The repetition of the country code is due to the need to make the detail fragment independent of the domain name, so as to allow both country-specific resolution as well as international resolution engines.
\subsection{The \textit{mURI} of the Expression}
Characterizing the expression is the specific identification of some content with respect to another content. This includes specifications of the version and the language variant of the expression. Therefore, different versions of the same work, or the same version of the same work expressed in different languages correspond to different expressions and will have different \textit{mURIs}.
%Expressions are organized in components (the expression components), and therefore we need to identify separately the expression as a whole as well as the individual %\textit{mURI} for each expression component. All of them are all immediately derived from the baseline, which is the \textit{mURI} of the work.
%\subsubsection{The mURI of the Expression}
The \textit{mURI} of the \textit{expression} consists of the following pieces:
\begin{enumerate}
\item The mURI of the corresponding work;
\item The character ``/'';
\item The language in which the expression is drafted (two or three letter code according to ISO 639-1, ISO 639-2, or ISO 639-3 in that order of preference); and
\item A version identifier, composed of the character ``@'' followed by:
\begin{itemize}
\item If an approved act, the version date of the expression in syntax YYYY-MM-DD,
\item If a bill, the presentation date is appropriate, or the stage in the approval process that the current draft is the result of,
\item If the version identifier is not followed by a date, the identifier points to the version in force at the moment of the resolution of the URI \footnote{i.e., the ``current'' version of the act, where ``current'' refers to the moment in time in which the \textit{mURI} is dereferenced, rather than the moment in time in which the document containing the URI was created}, and
\item Optionally an issue number when needed for disambiguation of versions created on the same date\footnote{When different modifications are made on the same day: This is not uncommon on the first day of the year.}.
\end{itemize}
\end{enumerate}
%The absence of the version identifiers signals two different situations depending on the type of document:
%\begin{itemize}
%\item If the document is not versioned (e.g., the minutes of an assembly) then no version identifier need to nor can be present.
%\item If the document is versioned (e.g., an act in force), then the lack of version identifiers refers to the version in force at the moment of the resolution of %the URI (i.e., the ``current'' version of the act, where ``current'' refers to the moment in time in which the \textit{mURI} is dereferenced, rather than the moment %in time in which the document containing the URI was created).
%\end{itemize}
\begin{table}[htbp]
\centering
\begin{tabular}{@{} |l|l| @{}}
\hline
/fr/minutes/2004-12-21/fr@ & French parliamentary debate record, 21st \\
& December 2004, French version\\
\hline
/nl/act/2004-02-13/2/en@ & Dutch enacted Legislation. Act number\\
& 2 of 2004. English version, current \\
& version (as accessed today) \\
\hline
/it/act/2004-02-13/2/it@2004-02-13 & Italian enacted Legislation. Act \\
& number 2 of 2004. Italian version,\\
& original version\\
\hline
/hu/act/2004-02-13/2/hu@2004-07-21 & Hungarian enacted Legislation. Act \\
& number 2 of 2004. Hungarian version, \\
& as amended, on July 2004\\
\hline
\end{tabular}
\caption{Examples of the expression mURI.}
%\label{tab:label}
\end{table}
\subsubsection{The mURI of Virtual Expressions}
\label{virtual}
In some situations it is necessary to create references or mentions of documents whose URI cannot be known completely reconstructed because the relevant date in uncertain (for instance because the exact delivery date is not known yet, the date of entry into force of a modifying act is not known in advance, etc). These are called virtual expressions (i.e., references to expressions that probably do not exist yet or ever, but can be unambiguously deduced once all relevant information are made available).
We must distinguish three cases in this situation:
\begin{enumerate}
\item the information is not known by the author of the expression (e.g., the legislator), in which case the act of actually retrieving the correct information is an act of interpretation;
\item the information is not known by the editor of the expression (e.g., the publisher of the XML version of the document), in which case the information can theoretically be available, but is too much of a burden for the publisher to reconstruct; or
\item the information is not known by the query system.
\end{enumerate}
In all these cases, the syntax for the URI of the virtual expression uses a similar syntax to the specification of the actual expression, but the character ``:'' is used instead of the ``@'' after the specification of the work URI.
For instance, if we need to cite the expression of an act in force on date ``1/1/2007'', we will need to refer to some expression whose date of entry in force was before 1/1/2007.
\begin{table}[htbp]
\centering
\begin{tabular}{@{} |l|l| @{}}
\hline
/at/act/2004-02-13/2/de:2004-07-21 & Austrian enacted Legislation. Act number \\
& 2 of 2004. German version, as amended on\\
& the closest date before July 21, 2004\\
\hline
\end{tabular}
\caption{Example mURI of a virtual expression.}
% \label{tab:label}
\end{table}
\subsubsection{The mURI of Ex Tunc Modified Expressions}
\label{extunc}
Similar to the virtual expression, but with reference to the past, is the case of an \emph{ex tunc} modification (errata corrige, annulment by constitutional court of a modification, etc.) of an already existing expression, which creates overlapping versions in the same time period.
In these cases the character ``;'' and a date can be appended to the expression URI to represent the date at from which we are viewing. In the case of document self-identification, this date should be the date the relevant event that constitutes an ex tunc modification happened.
Note that ex tunc modification is substantially different from retroactive applicability, which is not at issue here. It should also not be confused with the possiblity of having multiple issues of a work on the same day (i.e. the expression issue number).
\begin{table}[htbp]
\centering
\begin{tabular}{@{} |l|l| @{}}
\hline
/it/act/2003-01-10/23/ita:2005-01-01;2007-12-12 &
Italian enacted legislation, \\
& act number 23 of 2003, \\
& Italian version, as amended \\
& on the closest date before \\
& January 1st, 2005, viewed \\
& on December 12, 2007, the \\
& date when some articles \\
& were anulled by court.\\
\hline
\end{tabular}
\caption{Example of the mURI of Ex tunc modified expressions.}
% \label{tab:label}
\end{table}
\subsection{The \textit{mURI} of Work and Expression Components}
If a document consists of relatively self-contained documents released together with one bibliographic identity, for instance appendices, annexes, tables, maps, etc., we speak of \emph{components}.
If expressions have identifiable components, then by extension the work may also be said to have components due to commonality of content between expressions. Only in few cases is there a difference in component structure between the work and its expressions\footnote{I.e. when an attachment is added or removed in a later version}.
Some works and expressions have components, while some are only composed of a main document. In order to explicitly refer to individual components, it is therefore necessary to introduce a naming convention that identifies individual components, and still allows an easy connection between the component and the expression or work it belongs to.
There are therefore two subcases.
\begin{description}
\item[The expression or work is only composed of one component:] In this case, the \textit{mURI} for the work or expression as a whole and for its main component are identical.
\item[The expression is composed of many components:] The \textit{mURI} for each expression or work component consist in this case of the following pieces:
\begin{enumerate}
\item The mURI of the corresponding expression or work as a whole;
\item The character ``/''; and
\item Either
\begin{itemize}
\item A unique name for the attachment, or
\item The name ``main'', which is reserved for the main document.
\end{itemize}
\end{enumerate}
\end{description}
\subsubsection{Hierarchies of components}
Frequently, the situation occurs in which an attachment has itself further attachments. This creates a complex hierarchical situation in which the component should be considered, in a way, an expression or work itself, whose components should be listed as well and properly differentiated. The process can be further iterated, in the situation in which not only an attachment to an expression has further attachments, but its attachments also have further attachments and so on. The situation must also foresee the situation in which attachments at different levels of the hierarchy end up having the same name (e.g., table A in schedule 1 and table A in schedule 2).
In such cases, each component must be considered as an expression or work by itself. Recursively, the \textit{mURI} of attachments are as follows:
\begin{itemize}
\item If the attachment does not have further attachments, its \textit{mURI} is provided as detailed in the previous section, without further addenda;
\item If the attachment has further attachments, the mURI as detailed in the previous section refers to the whole attachment, including its own attachments;
\item To refer to the main document of an attachment that has further attachments, a further ``/main'' part should be added; and
\item To refer to any further attachment of an attachment, a further ``/'' followed by a unique name for the attachment must be added to the attachment itself.
\end{itemize}
\begin{table}[htbp]
\centering
\begin{tabular}{@{} |l|l| @{}}
\hline
{\small /fr/minutes/2004-12-21/fr@/main} &French parliamentary debate \\
& record, 21st December 2004, \\
& French version, main act\\
\hline
{\small /nl/act/2004-02-13/2/en@/main/annex1} & Dutch enacted legislation, \\
& Act number 2 of 2004-02-12, in \\
& English, current version (as \\
& accessed today), annex1 to the \\
& main document\\
& (as accessed today) \\
\hline
{\small /it/act/2004/2/it@/main/annex1/table3} & Italian enacted legislation, \\
& act number 2 of 2004, Italian \\
& current version, table3 \\
& of the annex1 of the main\\
& document\\
\hline
{\small /hu/act/2004-02-13/2/hu@2004-07-21/main/map4} & Hungarian enacted Legislation. \\
& Act number 2 of 2004-02-13. \\
& Hungarian version, as amended \\
& on July 2004, map 4 of the \\
& main document\\
\hline
{\small /hu/act/2004-02-13/2/main/map4} & Hungarian enacted Legislation. \\
& Act number 2 of 2004-02-13. \\
& commonly found map 4 of \\
& main document\\
\hline
\end{tabular}
\caption{Some examples of the use of component mURI.}
%\label{tab:label}
\end{table}
\subsection{The mURI of the Manifestation}
Characterizing the manifestation is the specific process that generated an electronic document in some specific format(s). This includes specifications of the data format. Therefore, different manifestations of the same expression generated using different data formats correspond to different manifestations and will have different \textit{mURIs}.
Manifestations are organized in components (the manifestation components), and therefore we need to identify separately the manifestation as a whole as well as the individual \textit{mURIs} for each manifestation component. All of them are all immediately derived from the baseline, which is the \textit{mURI} for the \textit{expression}.
\subsubsection{The manifestation as a whole}
The \textit{mURI} for the manifestation as a whole consists of the following pieces:
\begin{enumerate}
\item The \textit{mURI} of the corresponding \textit{expression} as a whole;
\item The character ``.'';
\item A unique three letter acronym of the data format in which the manifestation is drafted, the acronym can be ``pdf'' for PDF, ``doc'' for MS Word, or ``xml'' for the XML manifestation; and
\item The ``pck'' for the package of all documents including XML version of the main document(s) according to the CEN MetaLex rules.
\end{enumerate}
\begin{table}[htbp]
\centering
\begin{tabular}{@{} |l|l| @{}}
\hline
/fr/minutes/2004-12-21/fr@.doc & Word version of the France \\
& parliamentary debate record, 21st \\
& December 2004, French version\\
\hline
/en/act/2004-02-13/2/en@.pdf & PDF version of English enacted. \\
& Legislation Act number 2 of 2004. \\
& English version, current version (as \\
& accessed today)\\
\hline
/it/act/2004-02-13/2/it@2004-07-21.pck & Package of all documents including \\
& XML versions of the Italian enacted \\
& Legislation. Act number 2 of 2004. \\
& Italian version, as amended in July\\
& 2004\\
\hline
\end{tabular}
\caption{Some examples of manifestation mURI.}
% \label{tab:label}
\end{table}
\subsubsection{The mURI of Manifestation Components}
Each manifestation component is an independent electronic structure (e.g., a file) in a single data format. Every type of manifestation has of course a different data structure and file structure. Therefore the actual format of the \textit{mURIs} of the components of the manifestation depends on the data format and cannot be formalized in general. In this section we therefore provide a grammar but not an exhaustive list of formats, that depends on the data format chosen for the manifestation.
The \textit{mURI} for each manifestation component consists of the following pieces:
\begin{enumerate}
\item The mURI of the corresponding expression as a whole;
\item The character ``/'';
\item Some unique identification of the manifestation component with respect either to the manifestation as a whole or to the expression component the component is the manifestation of;
\item The character ``.''; and
\item A unique extension of the data format in which the manifestation is drafted, for instance \texttt{pdf} for PDF, \texttt{doc} for MS Word, \texttt{xml} for XML documents, \texttt{tif} for TIFF image format, etc.
\end{enumerate}
In the next section we will examine the format of the package and the relevant URIs for a specific manifestation of CEN MetaLex documents, the XML format.
\subsubsection{The URI for the components in the CEN MetaLex package manifestation}
The CEN MetaLex XML manifestation is a very specific manifestation using a number of data formats (mainly XML but could include other multimedia formats as needed) with a very specific organization of parts and components. Since it makes explicit choices in terms of data formats and reciprocal references, it is important to provide clear and non-ambiguous rules as to the internal naming mechanism and its overall structure.
A CEN MetaLex XML manifestation is a package composed of one or more files organized in a flat fashion. The transportable format is a ZIP file whose extension is \textbf{.pck}. Other formats are possible and acceptable as long as they adhere to these rules.
The following are alternative options for the CEN MetaLex package:
\begin{enumerate}
\item If the document is just composed of text and does not refer to any multimedia fragment of any form, then the ZIP package contains a single document called ``main.xml'';
\item If the document is composed of many manifestation components but does not refer to any multimedia fragment of any form, then the zip package is composed of many XML files, \textbf{one for each expression component}; Each manifestation component is then called the same as its corresponding expression component, plus the \texttt{.xml} extension; The name ``main'' is reserved for the main component; Numbers are never used;
\item If the document contains multimedia fragments of any kind, then each individual fragment does not have a corresponding expression component, but is just a manifestation component referred to in the \texttt{img} or \texttt{object} element; All multimedia components must be stored within an inner structure (e.g., a folder) called \texttt{media/}; Multimedia components can be called freely, but must use the appropriate extension to refer to their content type; Thus a logo can be called \textbf{logo.tif} or any other name, as long as the extension is correctly specifying the content type.
\end{enumerate}
Reciprocal references to manifestation components are necessary within a specific manifestation. For instance, the manifestation of the main document refers to the manifestations of its attachments via the \texttt{attachment} elements, and the schedule showing an image refers to the file of the image via the \texttt{img} element. In these cases, all references \textbf{must} be relative to the package (i.e., the manifestation as a whole).
\small\begin{table}[htbp]
\centering
\begin{tabular}{@{} |l|l| @{}}
\hline
attachment01.xml & Manifestation of the first attachment\\
\hline
schedule03.xml & Manifestation of the third attachment\\
\hline
media/logo.tif & Manifestation of an image within the document\\
\hline
\end{tabular}
%\caption{TableCaption}
% \label{tab:label}
\end{table}
References to manifestation components are rarely, if ever, needed outside of the manifestation themselves. But if needed, they will refer to the file as follows:
\begin{enumerate}
\item The \textit{mURI} of the corresponding expression as a whole;
\item The character ``/''; and
\item The relative reference to the required manifestation component as specified above.
\end{enumerate}
\section{Addressing}
References to URI are usually made with the \textbf{href} or \textbf{src} attribute. Because the href and src attributes are also RDF/A attributes the referring element also encodes a metadatum if an \textbf{about} and \textbf{rel} attribute are present.
The MetaLex standard distinguishes three forms of addressing:
\begin{enumerate}
\item Reference to external objects (agents, events, etc): these are always encoded as metadata, and if
embedded, always using the \textbf{href} or \textbf{resource} attribute.
\item Citation of other bibliographic objects: this can be achieved through the citations
attribute group, or through metadata, always using the \textbf{href} attribute. The target should either be identified by use of the naming convention,
or by identifying metadata. In some cases the citation can be embedded in both attribute form and embedded
metadata, because of the double interpretation of \textbf{href}.
\item Manifestation component inclusion: this can be achieved through the \textbf{src} attribute and either conformance to
the naming convention, or additional metadata.
\end{enumerate}
The distinguishing property of references is that the (\textbf{rel}) value is a subproperty of \textbf{metalex-owl:refersTo}. Citation uses a property value that is a subproperty of \textbf{metalex:cites}, which is itself a subproperty of \textbf{metalex-owl:refersTo}.
This means that the fact that some MetaLex XML element refers to another entity can also be stored in RDF, external to the MetaLex XML document.
\subsection{Reference}
A reference is something that refers to or designates something else, or acts as a standin for a relation between two things: the \emph{referrer} and the \emph{referent}. Since a relation can also be identified, the generic form of a reference is \emph{(referrer, predicate, referent)}, where predicate is the name of the relation, and are represented as RDF or RDF/A.
In the sense intended here a reference is an XML element (directly or indirectly) containing text, and the text refers deemed to refer to something else. The XML element will typically be of the \textbf{inline} content model type. A \textbf{citation} is an expression that refers to something intralinguistic, i.e. to another XML element (directly or indirectly) containing text, or to the bibliograpgic objects directly or indirectly embodied by it. Other references refer to something extralinguistic, i.e. something other than text, recoverable from the context in which the document was produced.
\emph{Article 1, the first article} and \emph{the previous article} are examples of citation, and \emph{the Minister, the President of the Republic, the accused}, and \emph{We, Beatrix, etc.} are examples of relevant references to other things.
A similar distinction, that should be distinguished from the previous one, is the distinction between exophora and endophora in linguistics. Take the following sentences:
\begin{enumerate}
\item \emph{Theft} is the unlawful taking of a good wholly or partially belonging to another.
\item \emph{It} (\emph{The theft}) must have been done with the intent to appropriate.
\end{enumerate}
\emph{It} (or \emph{The theft}) obviously refers back to Theft in the previous sentence. It is an endophoric proform expression, as opposed to exophoric (That must have been done intentionally, pointing to an act of taking in progress). Exophoric proform expressions are obviously rare in legal resources.
There is however a difference between the reference \emph{It} referring to theft as defined in the previous sentence and \emph{the previous sentence} as referring to a sentence. Theft and it are coreferents of something other than text: it is not an intralinguistic reference. Expressions like it do stand in for another expression (Theft), but only to indirectly reference meaning recoverable from context. They are used to avoid repetitive expressions and in quantification (i.e. carrying a variable from one sentence into the next one).
The following is an example of a reference metadatum in RDF/A:
\begin{verbatim}
\end{verbatim}
The use of references is optional. The \textbf{src} attribute \textbf{may not} be used.
\subsection{Citation}
\label{citation}
A reference with a single target (e.g. \emph{article 1 of the Income Tax Law or article 15.3}) is easy to mark up. The entire referring text can be marked, and it should be linked to the one concept that is being referred to. This can be either a work or an expression, depending on whether or not version information is specified or hinted at.
A citation in legislation normally refers to a work. A reference to legislation in a court verdict necessarily refers to an expression. References in other documents can refer to any level.
References with multiple targets are more complex to mark, as we now have several concepts to link to (i.e. \emph{article 5 and 6 of the Income Tax Law}). An important problem is where to anchor these links. In addition there are references to ranges, such as \emph{articles 5-10 of the Income Tax Law}.
This reference will have several targets, and without studying the target document we cannot determine the exact target locations. It might be \emph{{article 5, article 6, article 7, article 8, article 9, article 10}} or \emph{{article 5, article 6, article 7, article 7a, article 7b, article 8, article 10}} or even \emph{{article 5, article 10}}. The exact targets cannot be included in the document, as they cannot be derived from the information present. In addition, the targets may change depending on the version being referred to in case of references to works.
Ranges get even more complex when exceptions are involved, such as \emph{articles 5-10 with the exception of article 9}. An interesting variation occurs when an exception is made to something other than a range: \emph{article 5 with the exception of the first member}. This implicitely turns article 5 into a range of \emph{member 1, [..] member n}.
A final format for references is the ``each time'': \emph{articles 10, 12, 15 and 16, each time the first member}. First member by itself cannot be a reference to something here. The links \textbf{may not} be to the articles but to the first member of each article.
The advantage of special markup for exception and each time constructs is that it is a hint to (less refined) parsers that discover and markup references that this one has already been discovered, interpreted, and discarded.
Each of these complications show the same pattern: there is a difference between the entities explicitly cited in the text, and the ones it refers to. In some cases an element must be explicitly cited to convey the information that it is \emph{not} being referred to!
Complex references create two problems for the MetaLex standard:
\begin{enumerate}\item There are multiple reasonable ways to mark up the same statement. Some XML standards for instance uses to start and end range attributes on an element enclosing both start and end of the range. Other standards have two separate elements marking the start and end, each usign the same href (or similar) attribute.\item The encoding of a metadata about a complex reference will necessarily require more than one metadata statement, and therefore need more than one carrier element if RDF/A is used. This is a problem if the aim is to declare existing XML documents as MetaLex compliant documents without adding new XML elements.\end{enumerate}
There are two methods for describing references. The following is an example of a citation \emph{metadatum} in RDF/A:
\begin{verbatim}
\end{verbatim}
Other available properties (\texttt{metalex-owl:citeFrom, metalex-owl:citeUpto, \\metalex-owl:excluding, metalex-owl:excludeFrom, metalex-owl:excludeUpto}) can be used for complex references if the information is stored in RDF or if RDF/A is strongly preferred and modifying the XML document is no problem. The \textbf{href} attribute \textbf{must} be used.
Alternatively the \textbf{metalex:citations attribute group} can be used, if adding elements to the original XML file is a problem but embedding the references inside the document is nevertheless strongly preferred, with the attributes \texttt{metalex:href, metalex:upto, metalex:exclude, metalex:excludeTo} to specify two types of ranges from \texttt{metalex:href} to \texttt{metalex:upTo}, and from \texttt{metalex:excludes} to \texttt{metalex:excludesTo}.
If the naming convention is not used, the target of the citation must be identified in accordance with the principles set down in section \ref{idmetadata}. The use of citations is optional.
\subsection{Component Inclusion}
\label{inclusion}
Some XML manifestations may include components by \emph{reference} instead of by physical inclusion as a fragment, i.e. the inclusion reference replaces a part of the MetaLex manifestation stored in an external object identified by a URI. Section \ref{scheme} provides some additional information.
Any part, except the top level container, of a standard metalex XML manifestation can be implemented as a inclusion reference to an external object. On the manifestation level you make choices about, for instance, object names and media formats (tiff, jpeg, pdf, etc.). In some cases a text that is (or could be) embodied as a XML manifestation (e.g., a chinese appendix of a treaty) is embodied alternatively by a media object.
For this the \textbf{metalex:srcatt attribute group} is provided, which defines the \textbf{metalex:src} attribute. Presence of the src attribute includes a component of the MetaLex manifestation stored in an external object identified by the URI that is the value of the src attribute.
While not technically necessary, the existence of components can also be easily described in the form of RDF/A metadata simply by adding the property \textbf{metalex-owl:component} in the \textbf{rel} attribute. The following is an example of a component inclusion metadatum in RDF/A, superimposed on the \textbf{metalex:src} attribute:
\begin{verbatim}
\end{verbatim}
If the naming convention is not used, the target of the inclusion must be identified in accordance with the principles set down in section \ref{idmetadata}. The \textbf{metalex:src} attribute \textbf{must} be used if the \emph{metalex-owl:component} property is used.
\section{Standard Conformance}
Conformance of XML documents to the MetaLex standard means:
\begin{enumerate}\item validation of XML documents against a schema that restricts the MetaLex XML schema, \item the theoretical possibility of obtaining an XML document that uses solely MetaLex generic elements and validates against the MetaLex XML schema by way of simple substitution, \item use of either the naming convention or the availability of sufficient metadata to construct names in conformance to the naming convention, and \item conformance to the CEN MetaLex Workshop Agreement written guidelines. \end{enumerate}
Any XML encoding is \emph{transformation conformant} if instances can be transformed automatically into conformant MetaLex XML documents. The workshop recommends implementing the transformation in the form of an XSL transformation (XSLT\footnote{See \texttt{http://www.w3.org/TR/xslt}}). The workshop also recommends copying any attributes from the original XML encoding into the MetaLex XML document.
No additional conformance criteria exist for the OWL Schema. The workshop strongly recommends using or subclassing MetaLex OWL Schema classes and properties wherever reasonable.
\subsection{Conformance of Types and Elements}
A schema that restricts the MetaLex XML Schema declares only MetaLex conformant elements. The process of declaring an element is subject to the following guidelines:
\begin{enumerate}
\item You \textbf{must} use one of the abstract content models for the element;
\item You \textbf{may} define a restriction of the corresponding concrete type;
\item You \textbf{may not} define an extension to the content model of a concrete type;
\item You \textbf{may} define an extension of a concrete type for the purpose of adding attributes;
\item You \textbf{must} define the elements as a substitution group of one of the abstract elements and you must identify a type which is either one of the provided concrete types, or the restriction of the content model or extension of attributes of a concrete type that you have defined.
\end{enumerate}
To define an element conforming to the standard that can be used in XML manifestations of sources of law, define a non-abstract complex type, and create an element belonging to the substitution group of one of the abstract elements according to the subtype specified, for instance:
\footnotesize
\begin{verbatim}
\end{verbatim}
\normalsize
\subsection{About Schema Validation}
The schema to be validated against is made available as a DTD++ schema and an XML Schema. A few technical issues uncovered by experiences in the Estrella project differentiate the intended meaning from the actual semantics captured by the XML Schema and DTD++ files, mostly because of differences between the intended meaning and the actual expressive power of the languages.
\begin{itemize}
\item Patterns of the form \texttt{a*} (zero or more \texttt{a}'s) and \texttt{a+} (one or more \texttt{a}'s) in the XML schema are replaced by the pattern \texttt{a*|a,a*|a,a,a*|a,a,a,a*|a,a,a,a,a*}, respectively \texttt{a+|a,a+|a,a,a+|a,a,a,a+|a,a,a,a,a+} to align the semantics of the DTD++ schema and the XML Schema, at least for up to 5, or any other appropriately large number, to the discretion of the technical committee, to allow different restrictions of \texttt{a} in derived conformant content models. In XML schema, if \texttt{b} and \texttt{c} are restrictions of \texttt{a}, the pattern \texttt{b,c} is not accepted as a restriction of \texttt{a+} or \texttt{a*}: the intended reading of the technical committee is that it should be.
\item As specified in section 3, attributes from any other (\textbf{\#other}) namespace \textbf{may} be used on any MetaLex element. Unfortunately this requirement as such is not expressible in the current version of XML Schema. The current version of the XML schema and DTD++ therefore allows \emph{any} attribute (even those in the metalex namespace) to be added to elements, and furthermore this property is \emph{not} inherited by derived complex types, and must be added manually \emph{every time} a derived type is created.
\end{itemize}
\bibliographystyle{apalike}
\bibliography{cwa,biblioD3.2}
\newpage
\section*{Appendix: OWL Schema Axioms}
\label{owl}
\include{2008-05-02}
\end{document}