\documentclass{mlcennote}
\usepackage{longtable}
% \usepackage{helvet}
\usepackage{natbib}
% Custom colors!
\usepackage{color}
\definecolor{navy}{rgb}{0,0,0.3}
\definecolor{darkred}{rgb}{0.3,0,0}
\definecolor{darkgreen}{rgb}{0,0.3,0}
\usepackage[pdftex,
pdftitle={Ontology Representation},
pdfauthor={Rinke Hoekstra},
pdfpagemode={UseOutlines},
plainpages=false,
pdfpagelabels,
bookmarks,bookmarksopen,
pdfstartview={FitH},
colorlinks,
linkcolor={navy},citecolor={darkred},urlcolor={darkgreen}]
{hyperref}
\usepackage{graphicx}
% \usepackage{at}
\renewcommand{\rmdefault}{ppl}
\renewcommand{\ttdefault}{pcr}
\setlength{\parindent}{0.0in}
\setlength{\parskip}{0.1in}
% property
% \newatcommand p[1]{\textsf{#1}}
% % class
% \newatcommand c[1]{\textsf{#1}}
% % individual
% \newatcommand oi[1]{\textit{#1}}
% \newatcommand m[1]{\mathcal{#1}}
%
%
% % property chain constructor
% \newatcommand o{\textsf{\textbf{o}}}
% \newatcommand some{\textsf{\textbf{some}}}
% \newatcommand only{\textsf{\textbf{only}}}
% \newatcommand not{\textsf{\textbf{not}}}
% \newatcommand self{\textsf{\textbf{self}}}
% \newatcommand exactly{\textsf{\textbf{exactly}}}
% \newatcommand min{\textsf{\textbf{min}}}
% \newatcommand max{\textsf{\textbf{max}}}
% \newatcommand value{\textsf{\textbf{value}}}
\title{CEN MetaLex Workshop Proposal}
\author{Alexander Boer and Rinke Hoekstra}
\institute{University of Amsterdam\\The Netherlands}
\author{Fabio Vitali}
\author{Monica Palmirani}
\institute{University of Bologna\\Italy}
\author{Bal{\'a}zs R{\'a}tai}
\institute{Carneades\\Hungary}
\runningauthor{Alexander Boer}
\correspondingauthor{Alexander Boer}
\email{aboer@uva.nl}
\Leibnizreportdate{June 2009}
\newcommand{\axiom}[3]{$#1$ & $#2$ & $#3$ \\}
\newcommand{\fact}[1]{\multicolumn{3}{l}{$#1$} \\}
\newcommand{\concept}[1]{\textsf{\textsc{#1}}}
\newcommand{\property}[1]{\textsf{#1}}
\newcommand{\individual}[1]{\textit{#1}}
\newcommand{\variable}[1]{\textit{#1}}
\newcommand{\xmlschema}{\href{http://www.metalex.eu/metalex/1.0}{XML Schema}\ }
\newcommand{\dtd}{\href{http://www.metalex.eu}{DTD++}\ }
\newenvironment{textbox}{\begin{quote}\small\begin{tabular}{|p{12cm}|}\hline\vspace{.1em}}{\\\hline\end{tabular}\end{quote}}
\newtheorem{remark}{Remark}
\setcounter{tocdepth}{2}
\abstract{
The partial agreement of december 2006 was adopted by the workshop in the understanding that it will be augmented with additional agreements on ontological formalization, citation and reference, time and versioning, and components and component inclusion. This document proposes additional agreements (taking into account proposals from \cite{estrella3.2:07}), and clarifies terminology.
\noindent\textbf{Status:} This document is a new, working draft \textbf{proposal} based on the September 2008 meeting proposal.
\noindent\textbf{NB:} this document is \textbf{work in progress}.
}
\begin{document}
\maketitle
\tableofcontents
\section{Status of this Document}\label{sec:status}
\subsection*{May Be Superseded} % (fold)
\label{sub:may_be_superseded}
This section describes the status of this document at the time of its publication. Other documents may supersede this document.
% subsection may_be_superseded (end)
\subsection*{Summary of Changes} % (fold)
\label{sub:summary_of_changes}
% subsection summary_of_changes (end)
The partial agreement of december 2006 was adopted by the workshop in the understanding that it will be augmented with additional agreements on ontological formalization, citation and reference, time and versioning, and components and component inclusion.
The CEN MetaLex workshop requested more rigorous formalization of the four ontological levels (work, expression, manifestation, item) at which a bibliographic entity exists, and what properties belong to which level. The technical committee committed itself to providing a definitive list of properties of expressions that distinguishes version, variant, consolidation, original, translation, etc.
\begin{itemize}
\item This document includes the partial agreement of december 2006, and proposes additional agreements (taking into account proposals from \cite{estrella3.2:07,estrella3.3:08}), and clarifies terminology.
\item This document is a new, draft \textbf{proposal} based on the September 2008 meeting proposal.
\end{itemize}
\subsection*{Versions} % (fold)
\label{sub:document_versions}
\noindent The latest version of this document can be downloaded on:
\begin{textbox}
\url{http://svn.metalex.eu/svn/MetaLexWS/documentation/2008proposal/}
\end{textbox}
\noindent Previous versions of this document can be found from the CEN MetaLex website:
\begin{textbox}
\url{http://www.metalex.eu}
\end{textbox}
% subsection document_versions (end)
\section{Introduction} % (fold)
\label{sec:introduction}
This document defines the MetaLex XML standard for legal sources, as established by the \emph{CEN Workshop on an Open XML Interchange Format for Legal and Legislative Resources (MetaLex)}. MetaLex prescribes \emph{syntactic} restrictions on XML documents and schemas, as defined by a \xmlschema\ and \dtd\ specification (\autoref{sec:syntactic-restrictions}), and defines a \emph{semantics} for XML document metadata (\autoref{sec:semantic-resctrictions}).
Conformance to this standard is defined inline, global conformance constraints are iterated in \autoref{sec:conformance}.
The italicized keywords MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used to specify normative features of MetaLex documents and tools, and are interpreted as specified in \href{http://www.ietf.org/rfc/rfc2119.txt}{RFC 2119}.
\subsection{Scope of the Agreement} % (fold)
\label{sub:scope_of_the_agreement}
The \emph{CEN Workshop on an Open XML Interchange Format for Legal and Legislative Resources (MetaLex)}, declares, by way of its title, an interest in legal and legislative resources. This workshop agreement limits the applicability of the proposed XML standard to sources of law and references to sources of law.
\paragraph{Source of Law} % (fold)
\label{par:source_of_law}
A \textbf{source of law} is a writing that can be, is, was, or presumably will be used to back an argument concerning the existence of a constitutive or institutional rule in a certain legal system, or, alternatively, a writing used by a competent legislator to communicate the existence of a constitutive or institutional rule to a certain group of addressees. Because the CEN Workshop is concerned only with an XML standard, it chooses not to appeal to other common ingredients of definitions of law that have no relevant counterpart in the information dimension.
Source of law is a familiar concept in law schools, and may be used to refer to both legislators (fonti delle leggi, sources des lois), legislation and case law (fonti del diritto, sources du droit), custom, etc. It should be noted that many romance languages make a distinction between the legislator as source of law, by way of speaking or writing, and the law as source of right(s), which is presumably what the existence of the law brings about.
In its broadest sense, the source of law is anything that can be conceived of as the originator of legal rules. In the context of MetaLex it strictly refers to communication in writing, and in a sense covers the \emph{fonti del diritto} in Italian and \emph{sources du droit} in French. There are two main categories of source of law in writing: \textbf{legislation} and \textbf{case law}.
% paragraph source_of_law (end)
\paragraph{Legislation} % (fold)
\label{par:legislative_resource}
The notion of a \textbf{legislative resource} includes legislation, and all writings produced by the legislator explaining and justifying legislation. The legislator is a legal person: it exists separately from any natural persons and organizations involved in the process of drafting and evaluating legislation. It is the formally correct completion of certain processes, usually dictated by law, that makes the legislator the formal author of a writing, and at the same time identifies the addressees to whom it applies. Obviously, the persons and organizations involved in the process of legislating may produce writings that are clearly precursors or legally required ingredients to the end product. These writings are also included in the notion of a legislative resource, but in this case it is not easy to give straightforward rules for deciding whether they are, or are not to be considered legislative resources. Different jurisdictions will have different theories on this subject.
% paragraph legislative_resource (end)
\paragraph{Case Law} % (fold)
\label{par:case_law}
The notion of \textbf{case law} includes \emph{[\ldots description here \ldots]}
% paragraph case_law (end)
% subsection scope_of_the_agreement (end)
\subsection{Design Requirements and Principles} % (fold)
\label{sub:design_requirements_and_principles}
MetaLex XML is an interchange format, a lowest common denominator
for other standards, intended not to replace jurisdiction-specific standards and
vendor-specific formats in the publications process but to impose a standardized
view on legal documents for the purposes of information exchange and interoperability in the context of software development.
To meet these requirements, MetaLex defines a mechanism for \textbf{schema extension}, adding \textbf{metadata}, cross \textbf{referencing}, constructing \textbf{compound documents} and a basic \textbf{naming mechanism}. These features are briefly discussed below, but are elaborated upon in later sections.
\subsubsection{Schema Extension} % (fold)
\label{ssub:schema_extension}
MetaLex XML defines a \textbf{basic content model} that can be applied to concrete XML documents only through the \textbf{implementation} of the content model in a schema that extends the MetaLex \xmlschema or \dtd. The existing concrete XML document does not have to be modified to achieve this.
% Existing XML documents, conforming to other XML
% schemas, may conform to the MetaLex basic content models by defining the used elements as
% \emph{implementations} of a MetaLex content model in a schema that extends the MetaLex
% schema (see sections \ref{cm} and \ref{attributes}). The existing document does not have to be modified to achieve this.
A \textbf{schema extension} specifies the names of elements used in concrete XML documents and allows for additional attributes on these elements. It \emph{may} also be used to further constrain the allowed content models if the schema extension is intended to be \emph{normative}. \footnote{For instance if the schema is used in an editor to validate the structure of the document \emph{before} it is published.}
\begin{textbox}
Concrete XML documents \emph{must not} instantiate the MetaLex \xmlschema or \dtd directly.
\end{textbox}
The MetaLex content models are described in \autoref{sec:content_model}.
The MetaLex schema extension mechanism is described in \autoref{sec:conformance}.
% subsubsection schema_extension (end)
\subsubsection{Metadata} % (fold)
\label{ssub:metadata}
MetaLex XML allows the definition of \textbf{metadata} on legal sources through the use of \href{http://www.w3.org/TR/rdfa-syntax/}{RDF/A} attributes on elements in conformant concrete XML documents, or as \href{http://www.w3.org/RDF/}{RDF} statements in a separate document or repository if it is not desirable to modify the concrete XML document.\footnote{RDF is the Resource Description Framework, as defined by the W3C, see \url{http://www.w3.org/RDF/}. RDF/A is a W3C specification for attributes to express structured data in any markup language, see \url{http://www.w3.org/TR/rdfa-syntax/}.}
Of central importance is the standardization of identification of documents (cf. section \ref{naming}). The MetaLex metadata schema distinguishes identity of documents on the item, manifestation, expression, and work level, in accordance with \cite{Saur1998}.
\begin{textbox}
Identifiers of elements in concrete MetaLex conformant XML documents \emph{must} be specified by means of an International Resource Identifier (IRI) as defined by \href{http://www.ietf.org/rfc/rfc3987.txt}{RFC 3987}.
\end{textbox}
\begin{textbox}
Every XML element in a MetaLex conformant concrete XML document \emph{must} declare an \texttt{id} attribute with a \textbf{unique} IRI value to identify itself.
\end{textbox}
\begin{textbox}
Every conformant MetaLex document \emph{must} declare at least the IRI of the document \emph{manifestation} it exemplifies using a metadata statement.
\end{textbox}
The MetaLex metadata schema is described in \autoref{sec:metadata}.
Additional conformance requirements are described in \autoref{sec:conformance}.
% subsubsection metadata (end)
\emph{Metadata} statements can be declared inside the XML document as RDF/A, or outside it as
RDF if it is not desirable to modify the XML document (see section \ref{meta}).
This however entails that , so that it is possible to make metadata statements
about the \emph{expression} fragment embodied by the element.
(i.e. a
triple with the format \texttt{\{this, exemplifies, URIRef\}}).
Self-identifying metadata of the manifestation, expression, and work is supplied either by conformance to a
\emph{naming convention} (cf. section \ref{scheme}), or by an explicit set of identifying metadata (section \ref{idmetadata}), stored inside or outside of
the document.
The target of citations (cf. section {citation}) is also identified using either the naming convention or a set of identifying
metadata. The target of citations is supplied either by using the \textbf{citations}
attribute group, or by explicit metadata.
The MetaLex schema also supports compounding documents by reference (cf. section \ref{inclusion}). The value of the \textbf{src}
attribute is interpreted as a reference that replaces a part of the MetaLex manifestation stored
in an external object identified by a URI. The target of inclusion by reference, a manifestation
component, is also identified using either the naming convention or a set of identifying metadata.
% subsection design_requirements_and_principles (end)
\subsection{Separating Content and Metadata} % (fold)
\label{sub:separating_content_and_metadata}
A guiding principle of the workshop is that identifiable structure of the document content is described by an XML Schema\footnote{\texttt{http://www.w3.org/XML/Schema}} and DTD++ schema, while metadata conforms to the Resource Description Framework (RDF) \footnote{\texttt{http://www.w3.org/RDF/}} and the associated schema language OWL\footnote{\texttt{http://www.w3.org/2007/OWL/wiki/OWL\_Working\_Group}}. The schema defines a number of meaning-neutral \emph{content models}, i.e. algebraic expressions of the elements and attributes that may (or must) be found in the content of the element. Metadata is used to describe the meaning and context of use of the text.
XML schema uses the same symbols –- uniform resource identifiers or URI -– as RDF. The fundamental difference is
that the URI is used in XML schema is used to attach identifiers to XML data structures in order for
software to refer to XML data structures, and in RDF to \emph{describe} the identified entities in a standardized format. Note that one \emph{could} conceive of XML documents as descriptions of the document, but conventionally the XML data structure referred to by a document URI is a manifestation of the document, and not a description of it.
The interface between metadata and the standardized XML manifestation consists of these shared URIs. The XML manifestation is what the URI refers to, and RDF descriptions describe the thing referred to by the URI.
% subsection separating_content_and_metadata (end)
\subsection{Appendices} % (fold)
\label{sub:appendices}
Appendices to this document are:
\begin{enumerate}
\item An XML Schema;
\item A DTD++ schema; and
\item An OWL schema.
\end{enumerate}
Check for availability of the appendices. The XML Schema and DTD++ schema are intended to be equivalent in meaning. See \autoref{ssec:about-schema-validation} for a formal specification of conformance to these schemas.
The latest version of schema files, existing translators and schema extensions, and examples are always found at:
\begin{textbox}
\url{http://svn.metalex.eu/svn/MetaLexWS/branches/latest/}
\end{textbox}
% Where they lead to ambiguity, it may be assumed that the DTD++ version best reflects the intentions of the technical committee of the workshop.
% subsection appendices (end)
% section introduction (end)
\subsection{Namespace and Namespace Qualification}
CEN MetaLex elements and attributes \textbf{must} be namespace qualified, even though they may be associated to a default namespace without prefix. The namespace of the current version of CEN MetaLex (1.0) regardless of the date of adoption, is:
\small{\begin{verbatim}
http://www.metalex.eu/metalex/1.0
\end{verbatim}}
XML elements, XML attributes, XML schema named complex types, etc, as well as OWL classes and OWL properties are declared in this namespace. Since ambiguity could arise in CEN MetaLex specification documents as to whether a name in this namespace refers to an XML Schema or OWL entity, we use the \emph{NCName} \texttt{metalex} for XML, and \texttt{metalex-owl} for OWL\footnote{XML elements and types, and OWL classes and properties are already distinguished by typographic conventions in CEN MetaLex. Note that OWL names only occur in MetaLex XML documents as the \emph{value} of XML attributes.}, i.e. it assumes the existence of two namespace declarations providing two different names for the \emph{same namespace}:
\small{\begin{verbatim}
xmlns:metalex="http://www.metalex.eu/metalex/1.0"
xmlns:metalex-owl="http://www.metalex.eu/metalex/1.0"
\end{verbatim}}
The OWL schema is published in conformance to the \emph{Best Practice Recipes for Publishing RDF Vocabularies}\footnote{http://www.w3.org/TR/swbp-vocab-pub/} guidelines as a hash namespace, recipe 3. This means that it can be imported with a \texttt{HTTP GET} request on the namespace under certain conditions.
\subsection{Bibliographic Entities}
\label{biblio}
The use of bibliographic terminology in the CEN MetaLex standard is inspired by the \emph{IFLA Functional Requirements for Bibliographic Records} (cf. \cite{Saur1998}):
\begin{itemize}
\item A \textbf{bibliographic object} is a bounded representation of a body of information,
designed with the intent to communicate, preserved in a form independent of a sender or receiver. A bibliographic work, expression, manifestation, and item are bibliographic objects.
\item A \textbf{bibliographic citation} is a representation of a bibliographic identifier of a bibliographic object, with the intent of referring to that bibliographic object. \emph{Article 1}, \emph{the first article} and \emph{the previous article} are examples of citation, and \emph{the Minister}, the \emph{President of the Republic}, \emph{the accused}, and \emph{We, Beatrix} are examples of references to other, interesting but non-bibliographic, things.
\item A \textbf{unique bibliographic identifier} identifies a bibliographic object uniquely. The \emph{uniform resource identifier} is used as a unique bibliographic identifier in MetaLex.
\item A \textbf{bibliographic work} is a bibliographic object, realized by one or more expressions, and created by one or more persons in a single creative process ending in a publication event. A work has an author or authors, and is the result of a publication event. We recognize the work through individual expressions of the work, but the work itself exists only in the commonality of \emph{content} between and among the various expressions of the work: it is an intentional object\footnote{I.e. it exists only as the object of one's thoughts and communication acts, and not as a physical object.}.
\item An \textbf{bibliographic expression} is a realization of one bibliographic work in the form of signs, words, sentences, paragraphs, etc. by the author of that work. Physical form aspects, as typeface or page-layout, are generally speaking excluded from the expression level. Any change in \emph{content} constitutes a gives rise to a new expression. If an expression is revised or modified, the resulting expression is considered to be a new expression, no matter how minor the modification may be. Expression is an intentional object.
\item A \textbf{bibliographic manifestation} embodies one expression of one bibliographic work. The boundaries between one manifestation and another are drawn on the basis of both content and physical form. When the production process involves changes in physical form the resulting product is considered a new manifestation. Thus, a specific XML representation, a PDF file (as generated by printing into PDF a specific Word file with a specific PDF distiller), a printed booklet, all represent different manifestations of the same expression of a work. Manifestation is an intentional object. A MetaLex XML document is a bibliographic manifestation.
\item A \textbf{bibliographic item} exemplifies one manifestation of one expression of one work: a specific copy of a book on a specific shelf in a library, a file stored on a computer in a specific location, etc. Items stored on a computer can be easily copied to another location, resulting in another item, but the same manifestation. This makes adding metadata about the item to the item in principle impossible. On the Internet generally speaking only the \emph{uniform resource locator} (URL) is an item-specific datum. An item is a physical object.
\end{itemize}
A MetaLex XML document is a standard manifestation of a bibliographic expression of a source of law. Editing the MetaLex XML markup and metadata of the XML document changes the manifestation of an expression. Changing the marked up text changes the expression embodied by the manifestation. Copying an example of the MetaLex XML document creates a new item.
Embedded metadata explicitly indicates its subject: if it is the document self-identifier (the empty string) \texttt{""} or a fragment identifier \texttt{"\#f"}composed of the document self-identifier and the value \texttt{f} of the \texttt{id} attribute of the intended target element, then the metadatum refers to the item (cf. section \ref{attributes}). The only item-level metadata property is however the \texttt{metalex-owl:exemplifies} property, which takes the manifestation-level identifier as value (cf. section \ref{naming}). The required type of subject of a metadata property is explicitly constrained in the OWL schema by a domain restriction.
XML attribute values by default pertain to the expression embodied by the manifestation, unless explicitly specified otherwise in this document.
Each bibliographic item exemplifies exactly one manifestion that embodies exactly one expression that realizes exactly one work. Because all these mappings are \emph{functional}, i.e. unambiguously maps to one entity, item identity can be, and often is, used as an indirect identifier of the other objects, similar to how, for instance, email addresses usually have a functional mapping to persons and can be used as an indirect identifier of persons. One can for instance refer to a work by referring to its initial expression in a context where a reference to a work is expected.
The inverse of these relations is however often not a function. One can think of the work as an abstraction of one or many expressions, the expression as an abstraction of one or many manifestations, the manifestation as an abstraction of one or many items. The manifestation, expression, and work are intentional objects whose existence is conditioned to the existence of at least one item, manifestation, expression, respectively. There is normally speaking no such thing as an expression that is not embodied, a work that is not realized, etc.
Besides the hierarchical constitutive relationships between the four levels, there are also horizontal relations between the objects within a level. The expressions of a work in the legal field are usually either the initially published expression, or expressions derived by content modification activity or translation activity. Manifestations of an expression are either the initially created one(s), or manifestations derived by editing activity. Items of a manifestation are either the initially created one(s), or copies of them.
The MetaLex OWL schema includes a number of event type definitions (cf. \cite{BoerEtAl2004}, and generally \cite{LagozeEtAl2000} on linking metadata to events) to make these horizontal relations explicit for the expression level. At the item level they cannot be embedded in a MetaLex item for obvious reasons.
Appropriate manifestation level version management methods and tools already exist (CVS, SVN, etc.). These are of no concern to the CEN MetaLex working group.
Most legislative events happen at the work and expression levels. Content-related events like markup, metadating, and digital signature happen at the manifestation level.
The CEN MetaLex standard aims to provide metadata for describing both the hierarchical and relational way of positioning and identifying bibliographic objects, at least at the work and expression level, at the relevant levels of granularity.
Some additional terminology is introduced to capture some bibliographic phenomena relevant to law, as well as operational criteria for recognizing these in a MetaLex XML document. These classifications should be considered metadata, and their specifications are found in the MetaLex OWL schema:
\begin{itemize}
\item A \textbf{bibliographic source of law} is a bibliographic object that can be, is, was, or presumably will be referred to, by way of bibliographic citation, to back an argument claiming the existence of a legal rule in a certain legal system, or, alternatively, a bibliographic object published or realized by a competent legislator to communicate a legal rule to a certain group of addressees. Both the legislator and the user of the bibliographic source of law understand it as a medium used for communicating the existence of legal rules, including auxiliary declarations required for the proper understanding of legal rules, between legislator and user.
\item An \textbf{initial version} of a bibliographic work is the expression that realizes the work at the time of its official release in the public domain as a bibliographic object. It is the \textbf{metalex-owl:result} of an \textbf{metalex-owl:InitialBibliographicCreation}.
\item A \textbf{version} of a bibliographic work is either the initial version of the work, or an expression realized by modification of a version. It is the \textbf{metalex-owl:result} of a \textbf{metalex-owl:BibliographicModification} of the (\textbf{metalex-owl:matter}) previous version.
\item A \textbf{version in force} of a bibliographic work is a version that is, was, or will be \emph{in force} during a specific time interval. The in force time intervals of versions in force of the same work do not appear to overlap in time \emph{when viewed from any specific vantage point in time}. Note however the possibility of \emph{virtual} (section \ref{virtual}) and \emph{ex tunc} (section \ref{idmetadata} and \ref{extunc}) expressions, which may create a substantial difference between vantage points in time.
\item A \textbf{version ex tunc} of a bibliographic work is an expression that riverberates back in time up to a specific date the effects in terms of content, force and efficacy of a constitutive event on a specific version of the work. The version ex tunc is thus, \emph{as it were}, generated by a fictional event past in time that allows rewriting institutional history, for instance through errata corrige, or at the occasion of an annulment of a modification made in the past by a constitutional court. The fictional event of such an expression is treated \emph{as if} it were a true event \emph{after} the constitutive event happened. Institutional history therefore appears different depending on which vantage point in time one takes. A version ex tunc is the \textbf{metalex-owl:result} of a \textbf{metalex-owl:FictionalExpressionCreation}, which is a
temporal fiction of (\textbf{metalex-owl:\\temporalFictionOf}) some other event -- the constitutive event -- that happened \emph{after} the fictional event.
\item A \textbf{consolidation} of a bibliographic source of law is a version realized by the execution of legal rules found in another bibliographic source of law to the previous version. It is the \textbf{metalex-owl:result} of a \textbf{metalex-owl:LegislativeModification} of the (\textbf{metalex-owl:matter}), which is the previous version, by the (\textbf{metalex-owl:instrument}), which is the other bibliographic source of law stipulating the modification. This is often when it enters into force, although the modification may of course be conditional upon some other event.
\item A \textbf{variant} of a version in force of a bibliographic work is an expression that shares its \emph{in force} time interval. Although the concept variant is often nominalized, it is a symmetric relationship (\textbf{metalex-owl:variant}) between two expressions.
\item A \textbf{language variant} of an expression of a bibliographic work is an expression that shares its in force time interval, and differs in nothing but language. For instance, the English, Dutch, Italian, and German versions of a European directive are different language variants. It is also a symmetric relationship (derived on \textbf{metalex-owl:variant}). In a MetaLex manifestation of the expression this is expressed in different values of the \textbf{xml:lang} attribute.
\item A \textbf{translation} of an expression of a bibliographic work is an expression that shares its in force time interval, differs in nothing but language, and has been realized by way of translation of one expression into another expression. Translation is an asymmetric relation between bibliographic expressions, expressed by a \textbf{metalex-owl:Translation} event, which has a \textbf{metalex-owl:translator}, taking the initial expression as a \textbf{metalex-owl:matter}, and the translated document as \textbf{metalex-owl:result}. The translation should not be confused with language variant: while language variants can be realized concurrently by the legislator, and are equally authoritive if they are, the translation of an expression is generally speaking less authoritive than the expression it is a translation of, even if \emph{officially} translated.
\end{itemize}
\subsection{Content Models instead of Elements}
\label{cm}
A MetaLex XML element is characterized by a name, a content model, and zero or more attributes.
According to the philosophy of descriptive markup (cf. the Text Encoding Initiative\footnote{http://www.tei-c.org/P4X/SG.html}), the name of an XML element is usually semantically-charged (i.e. it provides a hint as to the meaning of the text fragment, or its role within the whole of the document). Additional information about the content of the element goes into attributes. The \emph{content model} (cf. \cite{VitaliEtAl2005}) is an algebraic expression of the elements that may (or must) be found in the content of the element. Generic elements, on the other hand, are named after the content model: they are merely a label identifying the kind of content model.
All XML vocabularies contain a mix of descriptive and generic elements, and, depending on the foreseen uses of the documents, emphasize one of the approaches. For instance, vocabularies with precise procedural semantics (e.g. XSLT, SVG) do not depend on generic elements, while vocabularies intended for diverse content (for instance XHTML) employ generic elements. Consider for instance that in XHTML 2.0 both \texttt{a} and \texttt{img} elements are being replaced or phased out in favour of generic substitutes using attributes.
The most important validation languages (e.g. XML Schema) do not allow validation rules to be associated to attribute values, so element names are currently the only way to associate validation rules to documents. This is a cause of pollution of principles, forcing semantically-charged elements to assume a rigid content model, while generic elements take care of odd situations that where not foreseen when the content models where designed.
Legislative drafting technique has a long tradition, and often its own standards of what legislative documents should look like. This makes descriptive markup combined with strict content models very tempting. On the other hand, there are so many exceptions that can be found in concrete examples we sometimes just want to give up on precise description altogether and resort to generic elements, in particular because there should be not one \emph{iota} of difference between the original expression of the legislator and the XML manifestation of that expression.
The approach of the workshop is to provide for a complete and automatic interchangeability of approaches, from generic to descriptive and vice versa.
Sharing content models is achieved by using two special attributes, \texttt{name} and \texttt{type} that provide information about the meaning and the content model of the element. If one of these values is missing, the name of the element supplies the missing value. The following elements are for instance equivalent from the point of view of the standard:
\footnotesize
\begin{verbatim}
\end{verbatim}
\normalsize
\section{Attributes and Content Models}
\label{attributes}
The \textbf{metalex:rootType} content model, and its associated types and elements, disappear from the schema: instead the \textbf{metalex:containerType} content model is used. The \textbf{metalex:containerType} content model permits the use of an \textbf{metalex:mcontainerType} element as initial element.
The optional metadata attribute \textbf{metalex:showAs} \textbf{must} contain a human readable label of the \emph{whole} metadata statement, i.e. of its subject, property and object.
The \textbf{metalex:src} (URI value) attribute is used to include a manifestation component by \emph{reference}.
Attributes from any other (\textbf{\#other}) namespace \textbf{may} be used on any MetaLex element.
The \textbf{xml:lang}\footnote{Used in conformance with \texttt{http://www.w3.org/TR/REC-xml/} and IETF RFC 3066.} \textbf{must} be recoverable from the context of the element.
Every MetaLex element \textbf{must} have an \textbf{id} attribute. Use \textbf{metalex:id} if no \textbf{id} already exists.
The \textbf{metalex:naming} attribute (value: a URI) is used to signal conformance to the section \ref{scheme} naming convention for URIs. It \textbf{must} be recoverable from the context of the element.
The \textbf{metalex:LocalNamingConventionMethod} (values: individual, ordinal, positional) and \textbf{LocalNamingConventionScope} (values: parent, root, or name of ancestor element relative to which identifiers like \emph{article 1} are unique) attributes are used to signal conformance to a local naming convention for structural parts of a document. They \textbf{must} be recoverable from the context of the element.
The \textbf{Quoted} content models have been added: in some cases -- in particular in modifying acts -- a source of law comforming to a certain content model \emph{quotes} other structural blocks that violate against the content model.
\section{Metadata}
\label{meta}
MetaLex uses the conventions of RDF/A processing for embedding RDF metadata statements inside MetaLex XML. Consult \texttt{http://www.w3.org/TR/rdf-primer/} for more information about RDF and \texttt{http://www.w3.org/TR/xhtml-rdfa-primer/} for RDF/A. In any case of ambiguity this specification takes precedence over RDF/A specifications.
The purpose of MetaLex embedded metadata is nothing more than storage of RDF formatted metadata in MetaLex XML. An RDF description of a resource consists of a set of statements. The MetaLex standard includes an OWL schema that specifies commonly required properties and classes in RDF statements about legal and legislative resources. This schema may be used with RDF stored outside the document in question, and the embedded metadata processing mechanism may be used with other metadata schemas like Dublin Core or PRISM.
The main difference between storage inside and outside the standard XML manifestation is the identification of the metadata author: the metadata inside the document is associated to the editor of the manifestation, who must be presumed to be the author of the metadata. An RDF statement has the following components:
\begin{description}
\item[subject:] the thing the statement describes;
\item[predicate:] a specific property;
\item[object:] the thing the statement says is the value of the property, for the thing the statement describes.
\end{description}
The subject and the property value are always URIs. The object is either a URI or a (optionally datatyped) literal.
See the RDF specifications at \texttt{http://www.w3.org/2001/sw/RDFCore/} for details on RDF literals and datatyping of literals.
A MetaLex document \textbf{must} declare what it is a manifestation of, as follows:
\small\begin{verbatim}
\end{verbatim}
Other metadata \emph{may} be embedded.
\subsection{Embedded Metadata Processing}
RDF/A statements \emph{may} be added to any MetaLex element if the content model allows it. Elements derived from the \textbf{metalex:urMetaType} type \textbf{must} contain RDF/A attributes expressing an RDF statement. Relative URI references in RDF/A attributes are relative to the \emph{xml base} of the containing element.
An RDF/A element is any XML element that contains either the attribute \textbf{property}, \textbf{rel}, or \textbf{rev}. Exactly one RDF statement is generated per \textbf{rel} (relation), \textbf{property}, or \textbf{rev} (reverse) attribute by an RDF/A processor: the attribute indicates a new statement whose \emph{predicate} is the URI value of that attribute. In the case of \textbf{rel} and \textbf{property}, the subject of the statement is decided by \emph{subject resolution}.
In the case of \textbf{rel}, the object is decided by \emph{URI reference object resolution}. In the case of \textbf{property}, the object is decided by \emph{literal object resolution}. In the case of \textbf{rev}, the subject of the triple is decided by \emph{URI reference object resolution} and the object of the triple is decided by \emph{subject resolution}. If both \textbf{rel} and \textbf{rev} attributes are used within the same element, two RDF statements are generated.
Literal object resolution yields either the value of the \textbf{content} attribute or, if it is absent, the element content. The value of the content attribute is by default interpreted as a plain literal. The element content is by default interpreted as an XML literal.
The \textbf{datatype} attribute is used to specify a specific XML Schema datatype \\(cf. \texttt{http://www.w3.org/TR/xmlschema11-2/}). If it is present, the value yielded by literal object resolution is interpreted as an instance of the XML Schema datatype.
URI reference object resolution yields either the URI value of the \textbf{resource} attribute or, if absent, the \textbf{href} or \textbf{src} attribute. The \textbf{resource} attribute is only used to specifically communicate that the URI is not intended to be ``clickable'', or if a \textbf{href} or \textbf{src} attribute is already present on the element for other reasons and does not refer to the intended object. It is strongly advised to use the \textbf{href} attribute whenever reasonable.
Subject resolution usually yields the URI value of the \textbf{about} attribute, or, if the RDFa element that includes the predicate attribute does not have an \textbf{about} attribute, the \textbf{about} attribute of the first ancestor element that has an \textbf{about} attribute. In the absence of an \textbf{about} attribute within scope, it yields the \emph{xml base} of the element.
\subsection{OWL Schema}
The OWL schema can be found at \texttt{http://www.metalex.eu/metalex/YYYY-MM-DD}, where \texttt{YYYY-MM-DD} is the date of the agreement, provided that one correctly sets the mime type, i.e:
\begin{verbatim}
GET http://www.metalex.eu/metalex/YYYY-MM-DD
Accept: application/rdf+xml
\end{verbatim}
A description logic syntax (cf. \cite{horrocks03reducing,baader01overview}) rendering is found in appendix \ref{owl}. The OWL schema specifies the concepts mentioned in section \ref{biblio}, the identifying metadata specified in section \ref{self} and \ref{idmetadata}, and the citation metadata specified in section \ref{citation}, as well as some related auxiliary concepts.
\section{Naming}
\label{naming}
Identification of bibliographic entities plays a role in:
\begin{enumerate}
\item Self-identification of documents;
\item Citation of other documents;
\item Inclusion of document components.
\end{enumerate}
In CEN Metalex the identification of bibliographic entities is based on a matching operation between a list of feature values characterizing the request and the list of feature values characterizing the bibliographic entities.
There are four different ways in which (a subset of) these feature values \textbf{may} be specified:
\begin{enumerate}
\item As parts of a structured URI, based on either the urn or the http protocol;
\item As RDF/A statements within the document when expressed in XML or HTML;
\item As property-value pairs within the document’s metadata; and
\item As RDF statements associated to or contained in the document.
\end{enumerate}
CEN Metalex compliant implementations must be able to access and make proper use of all these mechanisms, regardless of the preferred or local approach.
In the first case, a compliant CEN Metalex implementation defines a naming system, whereby the feature values are used to compose a string that is compatible with the URI or URI reference syntax as specified in IETF RFC 3986. Naming systems must be compliant with the requirements specified in section 5.1 through 5.4, and must describe a subset of the identification features listed in section 5.5 sufficient to allow for the correct resolution of the URIs. A sound candidate for a naming system that is compliant to these requirements is the Akoma Ntoso naming convention (cf. \cite{?}), which can be used in the absence of a local naming system.
In all other cases, a syntax-dependent mechanism based on qualified names must be used to extract the relevant feature values and used for identification and access purposes. \marginpar{Support for pdf, txt, etc??}
Regardless of the approach used to specify the feature values, CEN Metalex provides a specific approach to resolution and identification of the physical URL of the appropriate bibliographic item that need to be implemented by compliant applications.
\subsection{Uniform Resource Identifiers}
Each bibliographic item encoded in the CEN MetaLex standard must have at least one URI according to IETF RFC 3986 (cf. \cite{?}) by which web-based applications can retrieve (and if needed show) the corresponding content. It is in principle not possible to encode item level information in the manifestation, and as such the item itself cannot provide information about its whereabouts.
Manifestation, expression and work elements also may have URIs, and they are therefore referred to, respectively, as item-level URI, manifestation-level URI, expression-level URI and work-level URI.
\subsubsection{Usage of Relative References instead of full URIs}
As specified in IETF 3986, many applications and protocols make use of an abbreviated form of URI to refer to resources. Such abbreviated forms, called relative references or relative URIs, need to be used in a context where enough information exists to determine the corresponding full URI by which the resource can be identified (a process called resolution). The context information is called base URI, and is made available in a number of application- and context-specific ways.
Although in general only full URIs are used to identify resources and relative reference rely on the base URI to generate the missing information, in CEN Metalex it is possible to use relative URIs to identify bibliographic entities, as long as any legal resolution of the relative URI generates a reference to the same work, expression, manifestation, item, respectively. This is possible if the resolution of a reference is independent of the base URI used.
It must be possible to establish the base URI of any item-, manifestation-, expression- and work-level URI, in conformance with the XML Base specification and IETF RFC 3986. This is true also for fragment identifiers: the concatenation of the established URI base and the id attribute of an element must result in a valid URI reference for the element (at the same level as the base URI), conformant to the addressing recommendations of W3C, which counts as a bibliographic identifier of the element as a bibliographic item.
Note that the explicit use of an xml:base attribute means that all item-level references remains the same after every copy operation of the document, which means that it also behaves as manifestation-level base. There are legitimate use cases of the xml:base attribute, where it is inserted as a temporary identifier to an XML subtree in an XML processing pipeline.
\subsection{Fragment Identifiers}
Every MetaLex element must have an id attribute, even though not necessarily the metalex:id attribute. The id value of an element is a fragment identifier. Concatenated to the manifestation level URI it globally identifies the element at the manifestation level, concatenated to the expression level URI it globally identifies the expression embodied by the content of the element, concatenated to the work level document URI it presumably globally identifies a structural element common to various expressions of the work.
\subsection{Conformance of Naming Mechanisms}
\label{conformance}
Any naming system for bibliographic entities that is based on URIs or URI references according to IETF RFC 3986 and that uses ids to identify document fragments can be used in a CEN Metalex environment as long as the following fundamental requirements are met:
\begin{enumerate}
\item The first three FRBR levels should be explicitly supported by the naming system: works, expressions and manifestations should all have names and they should be different. No requirement for items needs to be supported for CEN Metalex compliance except for the existence of a mapping mechanism that provides the URI of an item (any of the existing ones) given an URI of another level and sufficient context information to disambiguate it.
\item The naming system should explicitly take into consideration the complex structure of a document, and the interrelation between components (e.g., between the main body of a document and its attachments, and the attachments' attachments).
\end{enumerate}
Furthermore names (as represented via URIs or URI references) should be:\begin{enumerate}
\item Persistent: names at all levels must maintain the same form over time regardless of the political, archival and technical events happened since their first generation;
\item Global: all relevant documents by all relevant bodies must be represented;
\item Memorizable: names need to be easy to write down, easy to remember, easy to correct if they were written down wrongly;
\item Meaningful: names need to mean something; It should be possible to make assumption about the kind, freshness and relevance of a citation by looking only at the document's name;
\item Guessable across levels: references to different levels of the same document should be similar; e.g., given a reference to an expression a user should be able to deduce the name of the work;
\item Guessable across document classes: references to different instances of the same document type should be similar; e.g., given a work-level reference to act 136/05, a user should be able to deduce the work-level name of act 76/06, etc;
\item Guessable across document components: references to different components of the same document at the same level should be similar; e.g., given an expression-level reference to attachment A of act 136/2005, a user should be able to deduce the expression-level name of attachment B of the same act; and
\item Hierarchical: names should use URI hierarchies whenever possible and appropriate; in particular, hierarchies should be used at least to separate the FRBR levels and document components.
\end{enumerate}
\subsection{Identification Features}
\label{features}
Naming systems may use a number of document features for the purpose of creating the URI to identify the bibliographic entity (at any level appropriate). Each of these feature may contribute to a fragment of the overall URI, whose presence is instrumental to uniquely identify the relevant bibliographic entity. Depending on the naming system, document type and syntax chosen, only some these features may be known or relevant for identification purposes.
Of course no features of a lower level should ever be used to identify higher levels of the bibliographic entity.
\subsubsection{Self-Identifying Features}
Any naming system should make explicit its nature, its identity, and its protocol in any URI used. The identification of these features allow any tool non-specific of the naming system to identify it as such and if appropriate to activate the appropriate conversion routes.
\begin{enumerate}
\item The protocol (e.g., http, urn)
\item The syntax (e.g.: Akoma Ntoso, NIR, Dutch naming convention, or any other CEN Metalex-compliant system used)
\end{enumerate}
\subsubsection{Work-level Features}
Work level features are used to characterize the identity of the bibliographic entity at the work level, thus regardless of versions, dates and lifecycle events, as well as data format or physical location.
\begin{enumerate}
\item The country emanating the document
\item The document type
\item Any specification of document subtype, if appropriate
\item The emanating actor
\item The promulgating actor, who may be implicitly deducible either by the document type or by the emanating organization
\item Any relevant creation date of the work
\item Any relevant number or disambiguating feature of the work (possibly including titles)
\end{enumerate}
\subsubsection{Expression-level Features}
Expression level features are used to characterize any specific selection of content across many that exist for the corresponding work. These are relevant regardless of data format, or physical location.
\begin{enumerate}
\item The language(s) associated (could be multiple)
\item The validity date(s) associated to actual content (could be multiple)
\item Any content authoring information to determine the authoritativeness of the text content. This is separate and independent of the authoring information relative to the metadata and markup, which are among the features of the of the manifestation.
\item Any content-specification date (as opposed to validity dates)
\end{enumerate}
\subsubsection{Manifestation-level Features}
Manifestation level features are used to characterize a specific choice of data format and markup. These include the decisions relevant to generating the document in this exact form. They are irrelevant of the storage choices.
\begin{enumerate}
\item The electronic data format chosen
\item The markup authoring information to determine the authoritativeness of the \emph{markup and metadata}
\item Any relevant markup-specific date
\item Any additional markup-related annotation (e.g., the existence of multiple versions, of annotations, etc.)
\end{enumerate}
\subsubsection{Item-level Features}
Item level features characterize the location and service-level data necessary to access the physical document .
\begin{enumerate}
\item The physical location
\item The owner of the physical location
\item Any additional service-level annotations (e.g., authentication, costs, authoritativeness, speed, etc.)
\end{enumerate}
\subsection{Identifying Metadata}\marginpar{This doesn't quite work as ``just an example''. The use of classes and attributes refers to the ontology. Although the ontology does not come with existence constraints, there should be a requirement that you use (i.e. mostly subclass) what is in it it unless you have an argued reason to deviate from it.}
\label{idmetadata}
Regardless of the naming system used, a set of metadata \textbf{must} be available, either in the form of RDF/A statements, in the form of RDF or in a form that is readily convertible in RDF. This set of metadata values is based on the list of features detailed in section \ref{features}.
Consider \texttt{[ManifURI]} to be the identifier of the manifestation, \texttt{[ExpURI]} the identifier of the Expression, \texttt{[WorkURI]} the identifier of the work, all declared as in the relevant naming system. Typed values \texttt{}, etc. are values conforming to the XML Schema datatype \texttt{xsd:date}, etc. A relevant set of metadata declarations for a work-level identification could thus be the following:
\small\begin{verbatim}
\end{verbatim}
The main purpose of the example above is to create a non-ambiguous sequence of work creation events for each work document type. Work document types need to be identified by a descriptive string (e.g. ”act” for act of parliament, or ”municipal;foobartown;decree”). It is recommendable to identify work document types with specific publication channels that can be regularly monitored. Document types are specific to countries. The combination of the date of the event and the issue number (some string) uniquely positions the work in the temporal sequence of works belonging to that document type.
A relevant set of metadata declarations for an expression-level identification could be the following:
\small\begin{verbatim}
\end{verbatim}
The \texttt{[CreationEventDateURI]} \textbf{may} and will often coincide with the creation of the work in non-versioned document types. In case of an ex tunc modification the creation event is a \texttt{FictionalExpressionCreation}, and it is a temporal fiction of some other event, to be filled in, which happened at another relevant date:
\small\begin{verbatim}
\end{verbatim}
A relevant metadata declaration for a manifestation-level identification could be the following:
\small\begin{verbatim}
\end{verbatim}
This metadata declaration \textbf{may} be encoded in the XML manifestation as metadata, but it is assumed that extraction of RDF metadata from a standard MetaLex manifestation already provides the information that the file type is \texttt{xml}.
The OWL schema provided more more identifying metadata that \textbf{may} be added.
\subsection{Signaling Identification Method Conformance}
Given the specifications in sections \ref{conformance} and \ref{features}, there are two different methods for uncovering the relationship between manifestation, expression, and work features, which have a direct impact on naming and addressing:
\begin{enumerate}
\item An explicit encoding of the relation between these URIs, and/or the class of the object designated by the URI, as metadata.
\item A naming convention for manifestations, expressions, and works that establishes a systematic relationship between transparent URIs, as detailed in a naming convention configuration specification, which allows one to derive the URI of one from the other.
\end{enumerate}
The attribute \textbf{metalex:naming} declares whether the manifestation respects the naming convention or uses some other user community naming schema. Similar to \textbf{xml:base} and \textbf{xml:lang}, the value of metalex:naming applies to the content of the element (usually the root) that carries the attribute, except when overridden by other \textbf{metalex:naming} attributes. The value of \textbf{metalex:naming} is a URI. The standard defines one naming scheme: \textbf{metalex-owl:MetaLexNamingScheme}, which is the proposed naming scheme of section 6. \marginpar{this is impossible! section 6 is gone}
The XML document metadata must declare the manifestation URI to identify itself. If the manifestation URI does not conform to the MetaLex naming convention it must declare the manifestation, expression, and work URI. In addition it must declare identifying metadata from which a conforming \textbf{metalex-owl:MetaLexNamingScheme} name can be constructed, being for instance the country, document type, date of creation of the work, an optional sequence number on that date, language of the expression, date of creation of the expression, and date of creation of the manifestation.
Identifying metadata for conforming documents can be automatically transformed into naming scheme conformant identifiers and vice versa.
\subsection{Manifestation Self-identification with Metadata}
\label{self}
Only a manifestation identifier \textbf{must} be declared with \textbf{metalex-owl:exemplifies}. The expression and the work must be declared in case of non-conformance to the naming convention. Noting that the URI reference \texttt{about=""} refers to the document itself, the following declares a standard manifestation, expression, and work base (using the naming convention):
\small\begin{verbatim}
\end{verbatim}
The RDF reading of \textbf{m1} is as follows: \textbf{m1} is a statement that states that the (referent of) \texttt{metalex:exemplifies} of (the referent of) \texttt{(empty string)} is (the referent of) \texttt{/tv/act/2004-02-13/2/tv}. An alternative, simpler form that is also permitted is the following:
\small\begin{verbatim}
\end{verbatim}
The semantics of \textbf{metalex-owl:thisEmbodies} and \textbf{metalex-owl:thisRealizes} are as follows, assuming \texttt{{p, q, r, s}} is a set of URI references\footnote{This is an instance of property chaining (cf. OWL 2 specifications: \texttt{http://www.w3.org/TR/2008/WD-owl2-primer-20080411/})}:
\begin{enumerate}
\item If \texttt{p metalex-owl:exemplifies q} and \texttt{q metalex-owl:embodies r} \\then \texttt{p metalex-owl:thisEmbodies r}
\item If \texttt{p metalex-owl:exemplifies q} and \texttt{q metalex-owl:embodies r} and \\\texttt{r metalex-owl:realizes s} then \texttt{p metalex-owl:thisRealizes s}.
\end{enumerate}
Read for \texttt{meta} in the examples above any appropriate element that permits metadata attributes. The URIs are relative, in this case conforming to the naming convention: the base is set by the processing environment. This means that the \emph{mURI} of the naming convention describes a URI \emph{reference} that potentially resolves to large set of URIs for each bibliographic object: one for each processing environment that sets its own base.
Note that if one wants to conform to the naming convention but also want to embed another identifier, this is trivially achieved with a metadata statement of that impact, for instance:
\small\begin{verbatim}
\end{verbatim}
\subsection{Resolving URIs}
The process of resolving a URI is in general the process of determining the access path to the resource specified. It is a requirement for CEN Metalex that there exists a subset of the features listed in section 5.4 that univocally identifies each bibliographic item. Thus each full set of feature values identifies either zero or one bibliographic item.
The process of determining the physical URL associated to a full set of feature values igenerates therefore either one accessible URL or an error (e.g., no such bibliographic entity exists).
In general though only a subset of the full set of feature values will be available when resolving URIs. This may generate the situation in which zero, one or many different physical URLs correspond to the given feature values.
For the purpose of this section, it is irrelevant whether these feature values are given through the analysis of the request URI or the identifying metadata as specified in section \ref{idmetadata}.
Given an incomplete set of feature values a resolution mechanism may apply one of two policies:
\begin{enumerate}
\item It may choose through either user interaction, access to default values, local policies or random choice just one suitable physical URL compatible with the given values;
\item It may return the list of all physical URLs that are compatible with the given values, and enough differentiating information to allow the user to access the most appropriate one.
\end{enumerate}
It is implementation dependent, therefore, to determine which policy to adopt. It may also very well be the case that the adopted policy may vary depending on the kind and quantity of available values.
For instance, if we have a set of values that uniquely identifies a manifestation for which many equivalent items exist, then the choice may be random, or it could be an automatic decision determined by authoritativeness and cost of each of the items. On the contrary, given a work-level set of values, then the system may either provide a list of available expressions, or request additional expression-level feature values to disambiguate the request (e.g., the validity date of the document, or the human language used).
We can therefore separate the process of resolution of a feature set in a number of different operations, as follows:
\begin{enumerate}
\item \texttt{convert(URI) $\rightarrow$ feature set} : this operation evaluates the information available in the naming system and generates a set of feature values that corresponds to those effectively expressed in the URI.
\item \texttt{expand() $\rightarrow$ feature set} : this operation adds new feature values to the existing set by either access to default values or to computational processes (e.g., specifying the country, the language, the data format, etc. depending on local defaults or application capabilities)
\item \texttt{request() $\rightarrow$ feature set} : this operation requests the user to fill in a number of values required by the application.
\item \texttt{list(feature sets) $\rightarrow$ feature set} : this operation shows the user a list of feature sets of available candidates, and allows the user to choose one (by specifying the preferred feature set).
\item \texttt{random(feature set) $\rightarrow$ feature set} : this operation chooses randomly among a list of feature sets and returns just one of the available ones. \marginpar{Why not just select? There is no presumption that it a an actual random selection, just a for the requester arbitrary one. Also: shouldn;t input be plural?}
\item \texttt{resolve(feature set) $\rightarrow$ URL} : this operation generates the physical URL uniquely corresponding to the feature set. This operation is only allowed if it returns either zero or one URL.
\end{enumerate}
As mentioned, resolving URIs may require to adopt different operations at different levels, as determine by local policies.
For instance, suppose we have a complete work-level URI. The resolution process would \emph{convert} it to the feature set $FS_{W0} = \{protocol, syntax, country, doctype, date, number\}$.
Application policies may first determine the remaining work-level features, e.g., the actors, which are deducible by the doctype, thereby \emph{expanding} $FS_{W0}$ to $FS_{W1} = $ $\{protocol,$ $ syntax,$ $ country,$ $ doctype,$ $ date,$ $ number,$ $ actors\}$. It would then use the language defaults of the user’s browser to \emph{expand} $FS_{W2}$ with a first set of expression-level features giving $FS_{E0} = \{protocol,$ $ syntax,$ $ country,$ $ doctype,$ $ date,$ $ number,$ $ actors,$ $ language\}$.
Then it would determine the list of compatible expressions (e.g., $FS_{E1} = $ $\{protocol, $ $syntax, $ $country, $ $doctype, $ $date, $ $number, $ $actors, $ $language, $ $date_1\}, $ $FS_{E2} = $ $\{protocol, $ $syntax,$ $ country, $ $doctype, $ $date, $ $number, actors, $ $language, $ $date_2\},\ldots , $ $FS_{En} = $ $\{protocol,$ $ syntax,$ $ country,$ $ doctype,$ $ date,$ $ number,$ $ actors,$ $ language,$ $ date_n\}$) and would \emph{list} $FS_{E1},$ $ FS_{E2},\ldots , $ $FS_{En}$, with the user choosing $FS_{Ek} = $ $\{protocol, $ $syntax,$ $ country,$ $ doctype,$ $ date,$ $ number,$ $ actors,$ $ language,$ $ date_k\}$. It would then \emph{expand} $FS_{Ek}$ by choosing by default a specific set of manifestation features (e.g., to use XML, the internal markup team, and the most recent markup date), generating $FS_{M0} = \{protocol,$ $ syntax,$ $ country,$ $ doctype, $ $date,$ $ number,$ $ actors,$ $ language,$ $ date_k,$ $ format,$ $ markup authoring,$ $ markup date\}$ and \emph{requesting} the user whether to show annotations or not, generating $FS_{M1} = \{protocol,$ $ syntax,$ $ country,$ $ doctype,$ $ date,$ $ number,$ $ actors,$ $ language,$ $ date_k,$ $ format,$ $ markup$ $authoring,$ $ markup$ $date,$ $ annotations\}$.
Then it would \emph{expand} $FS_{M1}$ with default item-level values determined by the user’s profile (e.g., no costs, maximum speed), generating $FS_I0 = \{protocol,$ $ syntax, $ $country, $ $doctype,$ $ date,$ $ number,$ $ actors,$ $ language,$ $ date_k,$ $ format,$ $ markup authoring,$ $ markup date,$ $ annotations,$ $ service type\}$. It would then discover all the available items compatible with $FS_{I0}$ and \emph{randomly} select among them one specified by ownership and location, $FS_{Ij} = \{protocol,$ $ syntax,$ $ country, $ $doctype,$ $ date,$ $ number,$ $ actors,$ $ language,$ $ date_k,$ $ format,$ $ markup$ $authoring,$ $ markup$ $date,$ $ annotations,$ $ service type,$ $ owner,$ $ location\}$. \marginpar{Btw a long example: perhaps we need a style differentiating examples from main text.}
Finally it would \emph{resolve} $FS_{Ij}$ generating the URL corresponding to it, and allowing the application access the physical bibliographic item.
Of course this is just one possible resolution policy, and many other could be imagined (e.g., a judge evaluating the legislation relevant to an event could set the expression-level validity date to the date of the event, always being served with the specific expression that was valid at that moment -- the existing versions would never be listed).
\section{Addressing}
References to URI are usually made with the \textbf{href} or \textbf{src} attribute. Because the href and src attributes are also RDF/A attributes the referring element also encodes a metadatum if an \textbf{about} and \textbf{rel} attribute are present.
The MetaLex standard distinguishes three forms of addressing:
\begin{enumerate}
\item Reference to external objects (agents, events, etc): these are always encoded as metadata, and if
embedded, always using the \textbf{href} or \textbf{resource} attribute.
\item Citation of other bibliographic objects: this can be achieved through the citations
attribute group, or through metadata, always using the \textbf{href} attribute. The target should either be identified by use of the naming convention,
or by identifying metadata. In some cases the citation can be embedded in both attribute form and embedded
metadata, because of the double interpretation of \textbf{href}.
\item Manifestation component inclusion: this can be achieved through the \textbf{src} attribute and either conformance to
the naming convention, or additional metadata.
\end{enumerate}
The distinguishing property of references is that the (\textbf{rel}) value is a subproperty of \textbf{metalex-owl:refersTo}. Citation uses a property value that is a subproperty of \textbf{metalex:cites}, which is itself a subproperty of \textbf{metalex-owl:refersTo}.
This means that the fact that some MetaLex XML element refers to another entity can also be stored in RDF, external to the MetaLex XML document.
\subsection{Reference}
A reference is something that refers to or designates something else, or acts as a standin for a relation between two things: the \emph{referrer} and the \emph{referent}. Since a relation can also be identified, the generic form of a reference is \emph{(referrer, predicate, referent)}, where predicate is the name of the relation, and are represented as RDF or RDF/A.
In the sense intended here a reference is an XML element (directly or indirectly) containing text, and the text refers deemed to refer to something else. The XML element will typically be of the \textbf{inline} content model type. A \textbf{citation} is an expression that refers to something intralinguistic, i.e. to another XML element (directly or indirectly) containing text, or to the bibliograpgic objects directly or indirectly embodied by it. Other references refer to something extralinguistic, i.e. something other than text, recoverable from the context in which the document was produced.
\emph{Article 1, the first article} and \emph{the previous article} are examples of citation, and \emph{the Minister, the President of the Republic, the accused}, and \emph{We, Beatrix, etc.} are examples of relevant references to other things.
A similar distinction, that should be distinguished from the previous one, is the distinction between exophora and endophora in linguistics. Take the following sentences:
\begin{enumerate}
\item \emph{Theft} is the unlawful taking of a good wholly or partially belonging to another.
\item \emph{It} (\emph{The theft}) must have been done with the intent to appropriate.
\end{enumerate}
\emph{It} (or \emph{The theft}) obviously refers back to Theft in the previous sentence. It is an endophoric proform expression, as opposed to exophoric (That must have been done intentionally, pointing to an act of taking in progress). Exophoric proform expressions are obviously rare in legal resources.
There is however a difference between the reference \emph{It} referring to theft as defined in the previous sentence and \emph{the previous sentence} as referring to a sentence. Theft and it are coreferents of something other than text: it is not an intralinguistic reference. Expressions like it do stand in for another expression (Theft), but only to indirectly reference meaning recoverable from context. They are used to avoid repetitive expressions and in quantification (i.e. carrying a variable from one sentence into the next one).
The following is an example of a reference metadatum in RDF/A:
\begin{verbatim}
\end{verbatim}
The use of references is optional. The \textbf{src} attribute \textbf{may not} be used.
\subsection{Citation}
\label{citation}
A reference with a single target (e.g. \emph{article 1 of the Income Tax Law or article 15.3}) is easy to mark up. The entire referring text can be marked, and it should be linked to the one concept that is being referred to. This can be either a work or an expression, depending on whether or not version information is specified or hinted at.
A citation in legislation normally refers to a work. A reference to legislation in a court verdict necessarily refers to an expression. References in other documents can refer to any level.
References with multiple targets are more complex to mark, as we now have several concepts to link to (i.e. \emph{article 5 and 6 of the Income Tax Law}). An important problem is where to anchor these links. In addition there are references to ranges, such as \emph{articles 5-10 of the Income Tax Law}.
This reference will have several targets, and without studying the target document we cannot determine the exact target locations. It might be \emph{{article 5, article 6, article 7, article 8, article 9, article 10}} or \emph{{article 5, article 6, article 7, article 7a, article 7b, article 8, article 10}} or even \emph{{article 5, article 10}}. The exact targets cannot be included in the document, as they cannot be derived from the information present. In addition, the targets may change depending on the version being referred to in case of references to works.
Ranges get even more complex when exceptions are involved, such as \emph{articles 5-10 with the exception of article 9}. An interesting variation occurs when an exception is made to something other than a range: \emph{article 5 with the exception of the first member}. This implicitely turns article 5 into a range of \emph{member 1, [..] member n}.
A final format for references is the ``each time'': \emph{articles 10, 12, 15 and 16, each time the first member}. First member by itself cannot be a reference to something here. The links \textbf{may not} be to the articles but to the first member of each article.
The advantage of special markup for exception and each time constructs is that it is a hint to (less refined) parsers that discover and markup references that this one has already been discovered, interpreted, and discarded.
Each of these complications show the same pattern: there is a difference between the entities explicitly cited in the text, and the ones it refers to. In some cases an element must be explicitly cited to convey the information that it is \emph{not} being referred to!
Complex references create two problems for the MetaLex standard:
\begin{enumerate}\item There are multiple reasonable ways to mark up the same statement. Some XML standards for instance uses to start and end range attributes on an element enclosing both start and end of the range. Other standards have two separate elements marking the start and end, each usign the same href (or similar) attribute.\item The encoding of a metadata about a complex reference will necessarily require more than one metadata statement, and therefore need more than one carrier element if RDF/A is used. This is a problem if the aim is to declare existing XML documents as MetaLex compliant documents without adding new XML elements.\end{enumerate}
There are two methods for describing references. The following is an example of a citation \emph{metadatum} in RDF/A:
\begin{verbatim}
\end{verbatim}
Other available properties (\texttt{metalex-owl:citeFrom, metalex-owl:citeUpto, \\metalex-owl:excluding, metalex-owl:excludeFrom, metalex-owl:excludeUpto}) can be used for complex references if the information is stored in RDF or if RDF/A is strongly preferred and modifying the XML document is no problem. The \textbf{href} attribute \textbf{must} be used.
Alternatively the \textbf{metalex:citations attribute group} can be used, if adding elements to the original XML file is a problem but embedding the references inside the document is nevertheless strongly preferred, with the attributes \texttt{metalex:href, metalex:upto, metalex:exclude, metalex:excludeTo} to specify two types of ranges from \texttt{metalex:href} to \texttt{metalex:upTo}, and from \texttt{metalex:excludes} to \texttt{metalex:excludesTo}.
If the naming convention is not used, the target of the citation must be identified in accordance with the principles set down in section \ref{idmetadata}. The use of citations is optional.
\subsection{Component Inclusion}
\label{inclusion}
Some XML manifestations may include components by \emph{reference} instead of by physical inclusion as a fragment, i.e. the inclusion reference replaces a part of the MetaLex manifestation stored in an external object identified by a URI. Section \ref{scheme} provides some additional information.
Any part, except the top level container, of a standard metalex XML manifestation can be implemented as a inclusion reference to an external object. On the manifestation level you make choices about, for instance, object names and media formats (tiff, jpeg, pdf, etc.). In some cases a text that is (or could be) embodied as a XML manifestation (e.g., a chinese appendix of a treaty) is embodied alternatively by a media object.
For this the \textbf{metalex:srcatt attribute group} is provided, which defines the \textbf{metalex:src} attribute. Presence of the src attribute includes a component of the MetaLex manifestation stored in an external object identified by the URI that is the value of the src attribute.
While not technically necessary, the existence of components can also be easily described in the form of RDF/A metadata simply by adding the property \textbf{metalex-owl:component} in the \textbf{rel} attribute. The following is an example of a component inclusion metadatum in RDF/A, superimposed on the \textbf{metalex:src} attribute:
\begin{verbatim}
\end{verbatim}
If the naming convention is not used, the target of the inclusion must be identified in accordance with the principles set down in section \ref{idmetadata}. The \textbf{metalex:src} attribute \textbf{must} be used if the \emph{metalex-owl:component} property is used.
\section{Standard Conformance}
Conformance of XML documents to the MetaLex standard means:
\begin{enumerate}\item validation of XML documents against a schema that restricts the MetaLex XML schema, \item the theoretical possibility of obtaining an XML document that uses solely MetaLex generic elements and validates against the MetaLex XML schema by way of simple substitution, \item use of either the naming convention or the availability of sufficient metadata to construct names in conformance to the naming convention, and \item conformance to the CEN MetaLex Workshop Agreement written guidelines. \end{enumerate}
Any XML encoding is \emph{transformation conformant} if instances can be transformed automatically into conformant MetaLex XML documents. The workshop recommends implementing the transformation in the form of an XSL transformation (XSLT\footnote{See \texttt{http://www.w3.org/TR/xslt}}). The workshop also recommends copying any attributes from the original XML encoding into the MetaLex XML document.
No additional conformance criteria exist for the OWL Schema. The workshop strongly recommends using or subclassing MetaLex OWL Schema classes and properties wherever reasonable.
\subsection{Conformance of Types and Elements}
A schema that restricts the MetaLex XML Schema declares only MetaLex conformant elements. The process of declaring an element is subject to the following guidelines:
\begin{enumerate}
\item You \textbf{must} use one of the abstract content models for the element;
\item You \textbf{may} define a restriction of the corresponding concrete type;
\item You \textbf{may not} define an extension to the content model of a concrete type;
\item You \textbf{may} define an extension of a concrete type for the purpose of adding attributes;
\item You \textbf{must} define the elements as a substitution group of one of the abstract elements and you must identify a type which is either one of the provided concrete types, or the restriction of the content model or extension of attributes of a concrete type that you have defined.
\end{enumerate}
To define an element conforming to the standard that can be used in XML manifestations of sources of law, define a non-abstract complex type, and create an element belonging to the substitution group of one of the abstract elements according to the subtype specified, for instance:
\footnotesize
\begin{verbatim}
\end{verbatim}
\normalsize
\subsection{About Schema Validation}\label{ssec:about-schema-validation}
The schema to be validated against is made available as a DTD++ schema and an XML Schema. A few technical issues uncovered by experiences in the Estrella project differentiate the intended meaning from the actual semantics captured by the XML Schema and DTD++ files, mostly because of differences between the intended meaning and the actual expressive power of the languages.
\begin{itemize}
\item Patterns of the form \texttt{a*} (zero or more \texttt{a}'s) and \texttt{a+} (one or more \texttt{a}'s) in the XML schema are replaced by the pattern \texttt{a*|a,a*|a,a,a*|a,a,a,a*|a,a,a,a,a*}, respectively \texttt{a+|a,a+|a,a,a+|a,a,a,a+|a,a,a,a,a+} to align the semantics of the DTD++ schema and the XML Schema, at least for up to 5, or any other appropriately large number, to the discretion of the technical committee, to allow different restrictions of \texttt{a} in derived conformant content models. In XML schema, if \texttt{b} and \texttt{c} are restrictions of \texttt{a}, the pattern \texttt{b,c} is not accepted as a restriction of \texttt{a+} or \texttt{a*}: the intended reading of the technical committee is that it should be.
\item As specified in section 3, attributes from any other (\textbf{\#other}) namespace \textbf{may} be used on any MetaLex element. Unfortunately this requirement as such is not expressible in the current version of XML Schema. The current version of the XML schema and DTD++ therefore allows \emph{any} attribute (even those in the metalex namespace) to be added to elements, and furthermore this property is \emph{not} inherited by derived complex types, and must be added manually \emph{every time} a derived type is created.
\end{itemize}
\bibliographystyle{apalike}
\bibliography{cwa,biblioD3.2}
\newpage
\section*{Appendix: OWL Schema Axioms}
\label{owl}
\include{2008-05-02}
\end{document}