\documentclass{mlcennote} \usepackage{longtable} % \usepackage{helvet} \usepackage{natbib} % Custom colors! \usepackage{color} \definecolor{navy}{rgb}{0,0,0.3} \definecolor{darkred}{rgb}{0.3,0,0} \definecolor{darkgreen}{rgb}{0,0.3,0} \usepackage[pdftex, pdftitle={Ontology Representation}, pdfauthor={Alexander Boer, Rinke Hoekstra, Emile de Maat, Fabio Vitali, Monica Palmirani and Balázs Rátai}, pdfpagemode={UseOutlines}, plainpages=false, pdfpagelabels, bookmarks,bookmarksopen, pdfstartview={FitH}, colorlinks, linkcolor={navy},citecolor={darkred},urlcolor={darkgreen}] {hyperref} \usepackage{graphicx} % \usepackage{at} \renewcommand{\rmdefault}{ppl} \renewcommand{\ttdefault}{pcr} \setlength{\parindent}{0.0in} \setlength{\parskip}{0.1in} % property % \newatcommand p[1]{\textsf{#1}} % % class % \newatcommand c[1]{\textsf{#1}} % % individual % \newatcommand oi[1]{\textit{#1}} % \newatcommand m[1]{\mathcal{#1}} % % % % property chain constructor % \newatcommand o{\textsf{\textbf{o}}} % \newatcommand some{\textsf{\textbf{some}}} % \newatcommand only{\textsf{\textbf{only}}} % \newatcommand not{\textsf{\textbf{not}}} % \newatcommand self{\textsf{\textbf{self}}} % \newatcommand exactly{\textsf{\textbf{exactly}}} % \newatcommand min{\textsf{\textbf{min}}} % \newatcommand max{\textsf{\textbf{max}}} % \newatcommand value{\textsf{\textbf{value}}} \title{CEN MetaLex Workshop Proposal} \author{Alexander Boer} \author{Rinke Hoekstra} \author{Emile de Maat} \institute{University of Amsterdam\\The Netherlands} \author{Fabio Vitali} \author{Monica Palmirani} \institute{University of Bologna\\Italy} \author{Bal{\'a}zs R{\'a}tai} \institute{Carneades\\Hungary} \runningauthor{Alexander Boer} \correspondingauthor{Alexander Boer} \email{aboer@uva.nl} \Leibnizreportdate{June 2009} \newcommand{\axiom}[3]{$#1$ & $#2$ & $#3$ \\} \newcommand{\fact}[1]{\multicolumn{3}{l}{$#1$} \\} \newcommand{\concept}[1]{\textsf{\textsc{#1}}} \newcommand{\property}[1]{\textsf{#1}} \newcommand{\individual}[1]{\textit{#1}} \newcommand{\variable}[1]{\textit{#1}} \newcommand{\xmlschema}{\href{http://www.metalex.eu/metalex/1.0}{XML Schema}\ } \newcommand{\dtd}{\href{http://www.metalex.eu}{DTD++}\ } \newcommand{\rfciri}{\href{http://www.ietf.org/rfc/rfc3987.txt}{RFC 3987}\ } \newenvironment{textbox}{\begin{quote}\small\begin{tabular}{|p{12cm}|}\hline\sffamily \color{navy}}{\normalcolor\\\\\hline\end{tabular}\end{quote}} \newenvironment{note}{\begin{quote}\small\begin{tabular}{|p{12cm}|}\hline\sffamily\textbf{Note: }\color{darkred}}{\normalcolor\\\hline\end{tabular}\end{quote}} \newenvironment{editnote}{\begin{quote}\small\begin{tabular}{|p{12cm}|}\hline\sffamily\textbf{Editor Note: }\color{darkgreen}}{\normalcolor\\\hline\end{tabular}\end{editnote}} \newenvironment{example}{\begin{quote}\small\begin{tabular}{|p{12cm}|}\hline\sffamily\textbf{Example:}\\\color{navy}}{\normalcolor\\\hline\end{tabular}\end{quote}} \newtheorem{remark}{Remark} \setcounter{tocdepth}{2} \abstract{ The partial agreement of december 2006 was adopted by the workshop in the understanding that it will be augmented with additional agreements on ontological formalization, citation and reference, time and versioning, and components and component inclusion. This document proposes additional agreements (taking into account proposals from \cite{estrella3.2:07}), and clarifies terminology. \noindent\textbf{Status:} This document is a new, working draft \textbf{proposal} based on the September 2008 meeting proposal. \noindent\textbf{NB:} this document is \textbf{work in progress}. } \begin{document} \maketitle \tableofcontents \section{Status of this Document}\label{sec:status} \subsection*{May Be Superseded} % (fold) \label{sub:may_be_superseded} This section describes the status of this document at the time of its publication. Other documents may supersede this document. % subsection may_be_superseded (end) \subsection*{Summary of Changes} % (fold) \label{sub:summary_of_changes} % subsection summary_of_changes (end) The partial agreement of december 2006 was adopted by the workshop in the understanding that it will be augmented with additional agreements on ontological formalization, citation and reference, time and versioning, and components and component inclusion. The CEN MetaLex workshop requested more rigorous formalization of the four ontological levels (work, expression, manifestation, item) at which a bibliographic entity exists, and what properties belong to which level. The technical committee committed itself to providing a definitive list of properties of expressions that distinguishes version, variant, consolidation, original, translation, etc. \begin{itemize} \item This document includes the partial agreement of december 2006, and proposes additional agreements (taking into account proposals from \cite{estrella3.2:07,estrella3.3:08}), and clarifies terminology. \item This document is a new, draft \textbf{proposal} based on the September 2008 meeting proposal. \end{itemize} \subsection*{Versions} % (fold) \label{sub:document_versions} \noindent The latest version of this document can be downloaded on: \begin{textbox} \url{http://svn.metalex.eu/svn/MetaLexWS/documentation/2008proposal/} \end{textbox} \noindent Previous versions of this document can be found from the CEN MetaLex website: \begin{textbox} \url{http://www.metalex.eu} \end{textbox} The latest version of schema files, existing translators and schema extensions, and examples are always found at: \begin{textbox} \url{http://svn.metalex.eu/svn/MetaLexWS/branches/latest/} \end{textbox} % subsection document_versions (end) \section{Introduction} % (fold) \label{sec:introduction} This document defines the MetaLex XML standard for legal sources, as established by the \emph{CEN Workshop on an Open XML Interchange Format for Legal and Legislative Resources (MetaLex)}. MetaLex prescribes \emph{syntactic} restrictions on XML documents and schemas, as defined by a \xmlschema\ and \dtd\ specification (\autoref{sec:syntactic-restrictions}), and defines a \emph{semantics} for XML document metadata (\autoref{sec:semantic-restrictions}). The XML Schema and DTD++ schema are intended to be equivalent in meaning. Conformance to this standard is defined inline, global conformance constraints are iterated in \autoref{sec:conformance}. The italicized keywords MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used to specify normative features of MetaLex documents and tools, and are interpreted as specified in \href{http://www.ietf.org/rfc/rfc2119.txt}{RFC 2119}. \subsection{Scope of the Agreement} % (fold) \label{sub:scope_of_the_agreement} The \emph{CEN Workshop on an Open XML Interchange Format for Legal and Legislative Resources (MetaLex)}, declares, by way of its title, an interest in legal and legislative resources. This workshop agreement limits the applicability of the proposed XML standard to sources of law and references to sources of law. \paragraph{Source of Law} % (fold) \label{par:source_of_law} A \textbf{source of law} is a writing that can be, is, was, or presumably will be used to back an argument concerning the existence of a constitutive or institutional rule in a certain legal system, or, alternatively, a writing used by a competent legislator to communicate the existence of a constitutive or institutional rule to a certain group of addressees. Because the CEN Workshop is concerned only with an XML standard, it chooses not to appeal to other common ingredients of definitions of law that have no relevant counterpart in the information dimension. Source of law is a familiar concept in law schools, and may be used to refer to both legislators (fonti delle leggi, sources des lois), legislation and case law (fonti del diritto, sources du droit), custom, etc. It should be noted that many romance languages make a distinction between the legislator as source of law, by way of speaking or writing, and the law as source of right(s), which is presumably what the existence of the law brings about. In its broadest sense, the source of law is anything that can be conceived of as the originator of legal rules. In the context of MetaLex it strictly refers to communication in writing, and in a sense covers the \emph{fonti del diritto} in Italian and \emph{sources du droit} in French. There are two main categories of source of law in writing: \textbf{legislation} and \textbf{case law}. % paragraph source_of_law (end) \paragraph{Legislation} % (fold) \label{par:legislative_resource} The notion of a \textbf{legislative resource} includes legislation, and all writings produced by the legislator explaining and justifying legislation. The legislator is a legal person: it exists separately from any natural persons and organizations involved in the process of drafting and evaluating legislation. It is the formally correct completion of certain processes, usually dictated by law, that makes the legislator the formal author of a writing, and at the same time identifies the addressees to whom it applies. Obviously, the persons and organizations involved in the process of legislating may produce writings that are clearly precursors or legally required ingredients to the end product. These writings are also included in the notion of a legislative resource, but in this case it is not easy to give straightforward rules for deciding whether they are, or are not to be considered legislative resources. Different jurisdictions will have different theories on this subject. % paragraph legislative_resource (end) \paragraph{Case Law} % (fold) \label{par:case_law} The notion of \textbf{case law} has not been defined by the workshop. % paragraph case_law (end) % subsection scope_of_the_agreement (end) \subsection{Design Requirements and Principles} % (fold) \label{sub:design_requirements_and_principles} MetaLex XML is an \textbf{interchange format}, a lowest common denominator for other standards, intended not to replace jurisdiction-specific standards and vendor-specific formats in the publications process but to impose a standardized view on legal documents for the purposes of \textbf{information exchange} and \textbf{interoperability} in the context of software development. To meet these requirements, MetaLex defines a mechanism for \textbf{schema extension}, adding \textbf{metadata}, cross \textbf{referencing}, constructing \textbf{compound documents} and a basic \textbf{naming convention}. These features are briefly discussed below, but are elaborated upon in later sections. \subsubsection{Schema Extension} % (fold) \label{ssub:schema_extension} MetaLex XML \xmlschema and \dtd define \textbf{basic content models} that can be applied to concrete XML documents only through the \textbf{implementation} of a content model in a schema that extends the MetaLex \xmlschema or \dtd. The existing concrete XML document does not have to be modified to achieve this. % Existing XML documents, conforming to other XML % schemas, may conform to the MetaLex basic content models by defining the used elements as % \emph{implementations} of a MetaLex content model in a schema that extends the MetaLex % schema (see sections \ref{cm} and \ref{attributes}). The existing document does not have to be modified to achieve this. A \textbf{schema extension} specifies the names of elements used in concrete XML documents and allows for additional attributes on these elements. It \emph{may} also be used to further constrain the allowed content models if the schema extension is intended to be \emph{normative}. \footnote{For instance if the schema is used in an editor to validate the structure of the document \emph{before} it is published.} \begin{textbox} A MetaLex \textbf{conformant concrete XML document} is any XML document that is a valid instance of a MetaLex conformant XML Schema, as defined in \autoref{sec:conformance}. \end{textbox} \begin{textbox} Concrete XML documents \emph{must not} instantiate the MetaLex \xmlschema or \dtd directly. \end{textbox} The MetaLex syntax and content models are described in \autoref{sec:syntax}. The MetaLex schema extension mechanism is described in \autoref{sec:conformance}. % subsubsection schema_extension (end) \subsubsection{Metadata} % (fold) \label{ssub:metadata} MetaLex XML allows the definition of \textbf{metadata} on legal sources through the use of \href{http://www.w3.org/TR/rdfa-syntax/}{RDFa} attributes on elements in conformant concrete XML documents, or as \href{http://www.w3.org/RDF/}{RDF} statements in a separate document or repository if it is not desirable to modify the concrete XML document.\footnote{RDF is the Resource Description Framework, as defined by the W3C, see \url{http://www.w3.org/RDF/}. RDFa is a W3C specification for attributes to express structured data in any markup language, see \url{http://www.w3.org/TR/rdfa-syntax/}.} Metadata is used to describe the meaning and context of use of the text contained in concrete MetaLex XML conformant XML documents. The MetaLex metadata schema is defined as an \href{http://www.w3.org/TR/2009/WD-owl2-overview-20090611/}{OWL 2} ontology.\footnote{OWL 2 is the Web Ontology Language, as defined by the W3C, see \url{http://www.w3.org/TR/2009/WD-owl2-overview-20090611/}.} Of central importance is the standardization of identification of documents (cf. section \ref{naming}). The MetaLex metadata schema distinguishes identity of documents on the \emph{item}, \emph{manifestation}, \emph{expression}, and \emph{work} level, in accordance with the Functional Requirements for Bibliographic Records (\href{http://www.frbr.org/}{FRBR}, \cite{Saur1998}). \begin{textbox} Identifiers of elements in concrete MetaLex conformant XML documents \emph{must} be specified by means of an Internationalized Resource Identifier (IRI) as defined by \rfciri. \end{textbox} \begin{note} The definition of IRI in \rfciri explicitly allows for both the hierarchic IRI scheme (as exemplified by URL), and any URN scheme. \end{note} \begin{textbox} Every XML element in a MetaLex conformant concrete XML document \emph{must} declare an \texttt{id} attribute with a \textbf{unique} ID value to allow identification with a unique IRI. \end{textbox} \begin{textbox} Every conformant MetaLex document \emph{must} declare at least the IRI of the document \emph{manifestation} it exemplifies using the prescribed metadata statement. \end{textbox} \begin{textbox} Self-identifying metadata of the manifestation, expression, and work \emph{should be} supplied by an explicit set of identifying metadata, stored inside or outside of the document. Alternatively, this metadata \emph{may} be discoverable through explicit compliance to a MetaLex conformant \textbf{naming convention}. \end{textbox} The MetaLex metadata schema is described in \autoref{sec:metadata}. The MetaLex naming convention is described in \autoref{sec:naming_convention}. Additional conformance requirements are described in \autoref{sec:conformance}. % subsubsection metadata (end) % so that it is possible to make metadata statements % about the \emph{expression} fragment embodied by the element. % % (i.e. a % triple with the format \texttt{\{this, exemplifies, IRIRef\}}). \subsubsection{Citations} % (fold) \label{ssub:citations_and_references} MetaLex XML supports cross referencing of legal sources by means of \textbf{citation}. The target of citations is supplied either by using the \texttt{citations} attribute group, or by explicit metadata. \begin{textbox} The target of citations \emph{must be} identifiable by the same means as specified for self-identifying metadata in \autoref{ssub:metadata}. \end{textbox} The MetaLex citation mechanism is described in \autoref{sec:citations}. % subsubsection citations_and_references (end) \subsubsection{Compound Documents} % (fold) \label{ssub:compound_documents} MetaLex XML supports compounding of documents by reference. The value of the \texttt{src} attribute is interpreted as a reference that replaces a part of the MetaLex manifestation stored in an external object identified by an IRI. The target of inclusion by reference, a manifestation component, is also identified using either the naming convention or a set of identifying metadata. The MetaLex mechanism for compounding document is described in \autoref{inclusion}. % subsubsection compound_documents (end) \subsubsection{Naming Convention} % (fold) \label{ssub:naming_convention} MetaLex XML supports the use of a \textbf{naming convention} that \emph{may} be used to construct the IRI identifier of structural and metadata elements referred to from MetaLex conformant concrete XML documents. \begin{textbox} Conformant documents \emph{should} use a MetaLex conformant naming convention if the identifiers of relevant elements are not available in an explicit form. \end{textbox} The MetaLex naming convention is specified in \autoref{sec:naming_convention} % subsubsection naming_convention (end) \subsubsection{Content and Metadata} % (fold) \label{ssub:content_and_metadata} % A guiding principle of the workshop is that identifiable structure of the document content is described by an XML Schema\footnote{\texttt{http://www.w3.org/XML/Schema}} and DTD++ schema, while metadata conforms to the Resource Description Framework (RDF) \footnote{\texttt{http://www.w3.org/RDF/}} and the associated schema language OWL\footnote{\texttt{http://www.w3.org/2007/OWL/wiki/OWL\_Working\_Group}}. % % The schema defines a number of meaning-neutral \emph{content models}, i.e. algebraic expressions of the elements and attributes that may (or must) be found in the content of the element. % % Metadata is used to describe the meaning and context of use of the text. Identification of elements in XML Schema occurs by means of the same symbols as in RDF. The fundamental difference is that the IRI used in XML schema is used to attach identifiers to XML \emph{data structures} so that software can refer to them.\footnote{Note that one \emph{could} conceive of XML documents as descriptions of the document, but conventionally the XML data structure referred to by a document IRI is a manifestation of the document, and not a description of it.} On the other hand, RDF \emph{describes} the identified entities in a standardized format. The interface between metadata and the standardized XML manifestation consists of these shared IRIs. The XML manifestation is what the IRI refers to, and RDF descriptions describe the thing referred to by the IRI. % subsubsection content_and_metadata (end) % subsection design_requirements_and_principles (end) % \subsection{Appendices} % (fold) % \label{sub:appendices} % Appendices to this document are: % % \begin{enumerate} % \item An XML Schema; % \item A DTD++ schema; and % \item An OWL schema. % \end{enumerate} % % Check for availability of the appendices. See \autoref{ssec:about-schema-validation} for a formal specification of conformance to these schemas. % % % Where they lead to ambiguity, it may be assumed that the DTD++ version best reflects the intentions of the technical committee of the workshop. % subsection appendices (end) % section introduction (end) \subsection{Namespace and Namespace Qualification} % (fold) \label{sub:namespace_and_namespace_qualification} MetaLex XML conformance of schema extensions and concrete XML documents specified in \autoref{sec:conformance} is dependent on a generic validation mechanism. This requires that all elements in conformant documents should be uniquely identifiable within a namespace. \begin{textbox} Elements and attributes defined by conformant extension of the MetaLex \xmlschema or \dtd \emph{must} be namespace qualified, even though they \emph{may} be associated to a default namespace without prefix. \end{textbox} The namespace of the version of CEN MetaLex defined by this document (1.0) regardless of the date of adoption, is: \begin{textbox} \url{http://www.metalex.eu/metalex/1.0} \end{textbox} XML elements, XML attributes, XML schema named complex types, etc, as well as OWL classes and OWL properties are declared in this namespace. Since ambiguity could arise in MetaLex specification documents as to whether a name in this namespace refers to an XML Schema or OWL entity, we use the \emph{NCName} \texttt{metalex} for XML, and \texttt{metalex-owl} for OWL\footnote{XML elements and types, and OWL classes and properties are already distinguished by typographic conventions in CEN MetaLex. Note that OWL names only occur in MetaLex XML documents as the \emph{value} of XML attributes.}, i.e. it assumes the existence of two namespace declarations providing two different names for the \emph{same namespace}: \begin{textbox} \texttt{xmlns:metalex="http://www.metalex.eu/metalex/1.0"} \\ \texttt{xmlns:metalex-owl="http://www.metalex.eu/metalex/1.0"} \end{textbox} The MetaLex OWL schema is published in conformance to the \emph{Best Practice Recipes for Publishing RDF Vocabularies} guidelines as a hash namespace, recipe 3.\footnote{See \url{http://www.w3.org/TR/swbp-vocab-pub/}} The MetaLex \xmlschema is published in a compatible manner. This means that both schemas can be imported with a \texttt{HTTP GET} request on the namespace URL given the respective \texttt{Accept} MIME header, i.e. \texttt{application/rdf+xml} for the OWL schema, and \texttt{application/xml} for the \xmlschema . % subsection namespace_and_namespace_qualification (end) \section{Syntax Specification} % (fold) \label{sec:syntax}\label{sec:syntactic-restrictions} This section describes the MetaLex XML \textbf{syntax} and \textbf{content models} as specified by the \xmlschema and \dtd . \subsection{Content Models} % (fold) \label{sub:content_models} \label{cm} The MetaLex XML syntax maintains a strict separation of syntactic elements (structure) and the signalled meaning of elements by distinguishing for each element its \textbf{name} and its \textbf{content model}. A content model (cf. \cite{VitaliEtAl2005}) is an algebraic expression of the elements that may (or must) be found in the content of an element. Generic elements, on the other hand, are named after the content model: they are merely a label identifying the type of content model. \begin{textbox} A MetaLex XML element is characterized by a \textbf{name}, a \textbf{content model}, and zero or more \textbf{attributes}. \end{textbox} Sharing content models is achieved by using two attributes, \texttt{name} and \texttt{type} that provide information about the meaning and the content model of an element. If one of these values is missing, the name of the element supplies the missing value. \begin{example} The following elements are equivalent:\footnote{Given an adequate \texttt{xml:base} specification, the first two \texttt{clause} elements will be fully qualified as required in \autoref{sub:namespace_and_namespace_qualification}.} \scriptsize \begin{verbatim} \end{verbatim} \normalsize \end{example} \subsubsection{Motivation} % (fold) \label{ssub:motivation} According to the philosophy of descriptive markup, the name of an XML element is often semantically-charged.\footnote{cf. the Text Encoding Initiative, see \url{http://www.tei-c.org/P4X/SG.html}} The name of the element provides a hint as to the meaning of the text fragment, or its role within the whole of the document. Additional information about the content of the element goes into attributes. Validation languages such as XML Schema do not allow validation rules to be associated to attribute values. Element names are currently the only way to associate validation rules to documents. This is a cause of pollution of principles, forcing semantically-charged elements to assume a rigid content model, while generic elements take care of odd situations that where not foreseen when the content models where designed. Legislative drafting technique has a long tradition, and often its own standards of what legislative documents should look like. This makes descriptive markup combined with strict content models very tempting. On the other hand, there are so many exceptions that can be found in concrete examples we sometimes just want to give up on precise description altogether and resort to generic elements, in particular because there should be no difference between the original expression of the legislator and the XML manifestation of that expression. The approach of the workshop is to provide for a complete and automatic interchangeability of approaches, from generic to descriptive and vice versa. % subsubsection motivation (end) % XML vocabularies contain a mix of descriptive and generic elements, and, depending on the foreseen uses of the documents, emphasize one of the approaches. For instance, vocabularies with precise procedural semantics (e.g. XSLT, SVG) do not depend on generic elements, while vocabularies intended for diverse content (for instance XHTML) employ generic elements. Consider for instance that in XHTML 2.0 both \texttt{a} and \texttt{img} elements are being replaced or phased out in favour of generic substitutes using attributes. \subsubsection{Overview} % (fold) \label{ssub:content_models_overview} MetaLex XML distinguishes the following four content models: \textbf{ur} types, \textbf{abstract} types, \textbf{abstract} elements and types, \textbf{generic} elements and types and \textbf{concrete} types. With respect to the definition of MetaLex conformant concrete XML documents, the following rule holds: \begin{textbox} \textbf{Generic elements} \emph{may} be instantiated, and conform to the standard. \end{textbox} With respect to the definition of MetaLex XML conformant schema extensions, the following rule holds: \begin{textbox} \textbf{Concrete types} \emph{should} be used for defining elements conforming to the standard. \end{textbox} Conversely, the other elements or types \emph{must not} be used for these purposes. All content models are constrained to just twelve different abstract complex types, of which six fundamental (the patterns) and six specialized for specific purposes. Three complex types are added to allow derivation of attributes, but cannot be used for anything but derivation. These are called \emph{ur types}. \paragraph{Ur Types} The three basic types of which all others are derived are: \begin{itemize} \item \texttt{urType} specifies the \textbf{basic attributes} for all our elements; \item \texttt{urContentType} is the type for elements that contain \textbf{content}; and \item \texttt{urMetaType} is the type for elements that become \textbf{metadata}. \end{itemize} These types only distribute the correct attributes to the actual content models. These types are \emph{abstract}: they \emph{must not} be (directly) instantiated as elements. The names of these types are prefixed with \textbf{ur} to signal their status as archetypes for the schema. \paragraph{Abstract Types} The six basic abstract types that reflect \textbf{patterns} in MetaLex XML documents are: \begin{itemize} \item \texttt{absRootType} is the root of a document; \item \texttt{absContainerType} is a container of a sequence of other elements; \item \texttt{absBlockType} is the largest structure where text and inline elements mix freely, e.g., paragraphs and other (usually vertically organised) containers of both text and smaller structures; \item \texttt{absInlineType} is an inline container of text and other inline elements (e.g., bold); \item \texttt{absMilestone} is an empty element that can be found in the text; and \item \texttt{absQuotationType} is an inline element that contains a fragment of some other document; it may contain any sequence of elements and text that can appear in a MetaLex document. \end{itemize} \paragraph{Abstract types (derived)} Abstract types that are specializations of the fundamental abstract types defined above: \begin{itemize} \item \texttt{absMcontainerType} is a container of metadata elements; \item \texttt{absHcontainerType} is a hierarchical container with titles and/or an index; it can only contain other containers; \item \texttt{absMetaType} is an empty element that is interpreted as metadata; \item \texttt{absDateType} is an inline element with date properties; \item \texttt{absCitationType} is an inline element with citation properties (it contains a cross reference to another text); and \item \texttt{absAnchorType} is an inline element with anchor properties (it is the source or destination of a hypertext anchor). \end{itemize} \paragraph{Abstract Elements} To enable the use of \emph{substitution groups} in the definition of conformant elements, the abstract types defined above have corresponding elements: \texttt{absRoot}, \texttt{absContainer}, \texttt{absHcontainer}, \texttt{absBlock}, \texttt{absInline},\\ \texttt{absDate}, \texttt{absCitation}, \texttt{absQuotation}, \texttt{absMilestone}, \texttt{absAnchor}, \\\texttt{absMcontainer}, \texttt{absMeta}. In addition, there is a \texttt{absHtitle} element derived from \texttt{absBlockType}, which is meant to contain a title and/or index for a \texttt{absHcontainer}. These elements are \emph{abstract} in the sense that they \emph{must not} be instantiated, but only substituted. These elements are used as the heads of the substitution groups of elements conforming to the standard. The names of abstract elements are prefixed with \emph{abs}. \paragraph{Generic Types} The MetaLex \xmlschema contains generic types for each abstract type, to be used when appropriate: \texttt{genRootType}, \texttt{genContainerType}, \texttt{genHcontainerType}, \texttt{genBlockType}, \\\texttt{genHtitleType}, \texttt{genInlineType}, \texttt{genCitationType}, \texttt{genDateType}, \\\texttt{genQuotationType}, \texttt{genMilestoneType}, \texttt{genAnchorType}, \texttt{genMcontainerType}, \texttt{genMetaType}. All generic types are prefixed by \emph{gen}. \paragraph{Generic Elements} Generic elements are named after the patterns of which they are substitutions: \texttt{root}, \texttt{container}, \texttt{hcontainer}, \texttt{block}, \texttt{htitle}, \texttt{inline}, \texttt{citation}, \texttt{date}, \texttt{quotation}, \texttt{milestone}, \texttt{anchor}, \texttt{mcontainer}, \texttt{meta}. Generic elements \emph{may} be instantiated, and conform to the standard. \paragraph{Concrete Types} Concrete types are included for all abstract types: \texttt{containerType}, \texttt{hcontainerType}, \texttt{blockType}, \texttt{htitleType}, \texttt{inlineType}, \texttt{citationType}, \texttt{dateType}, \texttt{quotationType}, \texttt{milestoneType}, \texttt{anchorType}, \texttt{mcontainerType}, \texttt{metaType}. Concrete types \emph{should} be used for defining elements conforming to the standard. \begin{note} The \texttt{metalex:containerType} content model permits the use of an \texttt{metalex:mcontainerType} element as initial element. \end{note} \begin{note} MetaLex now defines \textbf{quoted} content models. They are to be described here. This feature is to be used with care as in some cases -- in particular in modifying acts -- a source of law conforming to a content model \emph{quotes} other structural blocks that violate the content model. (Rinke Hoekstra) \end{note} % subsubsection content_models_overview (end) % subsection content_models (end) \subsection{Attributes} \label{sub:attributes} MetaLex XML attributes provide meaning to elements. Meaning in terms of semantics, roles, additional information, metadata and content model. This means than any element with any name, as long as it has the correct set of attributes, can be placed in the document. This assures that conformance, validity and interchange is based on attributes alone, and not element names. \begin{textbox} Attribute names are \textbf{fixed} and \textbf{normative}, element names are subject to localization, of both language and jurisdiction. \end{textbox} \subsubsection{Standard Attributes} % (fold) \label{ssub:standard_attributes} MetaLex XML defines the following attributes: \begin{itemize} \item \texttt{metalex:name}: a semantically-charged name that identifies in a human-understandable way the purpose and meaning and role of the element. This attribute is required for generic elements, and optional for descriptive elements (but if present it must coincide with the element name). \item \texttt{metalex:type}: the name of one of the (few) content models approved for use in this schema. Of course, the content model of the element must also be coherent with the content model specified in the \texttt{metalex:type} attribute. Furthermore, this attribute is required for descriptive elements, and optional for generic elements (but if present it must coincide with the element name). \item \texttt{metalex:subtype}: the name of a more specific content model of the element. \item \texttt{metalex:id}: an id that uniquely identifies the element. The syntax of the value of the id attribute \emph{may} depend on the \texttt{metalex:\\LocalNamingConventionMethod} and \texttt{metalex:\\LocalNamingConventionScope} attributes, given compliance to a MetaLex conformant \textbf{naming convention}. All elements in the document \emph{must} have an \texttt{metalex:id} except for globally and locally unique ones (where they are optional). \item \texttt{metalex:date}: all elements containing or referring to a date or a moment in time \emph{must} provide a normalized value for that date, conformant to the subset of \href{http://www.w3.org/TR/NOTE-datetime}{ISO 8601} used in the XML Schema language, that can be used regardless of understandability of the element's name or the ambiguity of the shown value. \item \texttt{metalex:src}: is used to include a manifestation component by \emph{reference}. \item \texttt{metalex:upTo}: is used in combination with \texttt{href} to refer to a range of targets, with \texttt{href} specifying the beginning of the range, and \texttt{upTo} specifying the end of the range. \item \texttt{metalex:exclude}: is used to define an exception to a citation, targeting an object that is to be left out of the citation. \item \texttt{metalex:excludeTo}: is used in combination with \texttt{exclude} to refer to a range of targets which are to be left out of a citation, with \texttt{exclude} specifying the beginning of the range, and \texttt{excludeTo} specifying the end of the range. \item \texttt{metalex:LocalNamingConventionMethod}: the method for indexing the element in the document. This can have one of three values: \begin{itemize} \item \texttt{individual}: this type of element only appears once in the entire document (e.g. the preface); \item \texttt{ordinal}: this type of element has some kind of ordinal index (e.g. number of letter) and can be referred to using this number; or \item \texttt{positional}: this type of element is neither unique, nor has it some kind of ordinal index, which means that it can only be referred to by counting out its position in the document. \end{itemize} \item \texttt{metalex:LocalNamingConventionScope}: the scope for indexing the element in the document. This can have one of two values: \begin{itemize} \item \texttt{global}: the index of the element is unique within the entire document; or \item \texttt{parent}: the index of the element is unique within its parent container. \end{itemize} \end{itemize} MetaLex XML also uses the following attributes defined elsewhere: \begin{itemize} \item \texttt{xhtml:class}: the name of a style class that can be found in a presentation package. \textbf{Optional} attribute for all elements. \item \texttt{xhtml:style}: a collection of (CSS) styles that need to be associated to the current element only. \textbf{Optional} attribute for all elements. \item \texttt{xml:lang}: specifies the language of the content of an element. The value of this attribute \emph{must} be recoverable from the context of the element.\footnote{Used in conformance with \texttt{http://www.w3.org/TR/REC-xml/} and \href{http://www.ietf.org/rfc/rfc3066.txt}{RFC 3066}.} \end{itemize} % subsubsection standard_attributes (end) \subsubsection{Metadata Attributes} % (fold) \label{ssub:metadata_attributes} MetaLex XML adopts the \href{http://www.w3.org/TR/rdfa-syntax/}{RDFa} recommendation for specifying metadata attributes. \begin{textbox} In case of ambiguity this specification takes precedence over the \href{http://www.w3.org/TR/rdfa-syntax/}{RDFa} specification. \end{textbox} An RDFa element is any XML element that contains either the attribute \texttt{property}, \texttt{rel}, or \texttt{rev}. Exactly one RDF statement is generated per \texttt{rel} (relation), \texttt{property}, or \texttt{rev} (reverse) attribute by an RDFa processor: the attribute indicates a new statement whose \emph{predicate} is the IRI value of that attribute. In the case of \textbf{rel} and \textbf{property}, the subject of the statement is decided by \emph{subject resolution}. In the case of \texttt{rel}, the object is decided by \emph{IRI reference object resolution}. In the case of \texttt{property}, the object is decided by \emph{literal object resolution}. In the case of \texttt{rev}, the subject of the triple is decided by \emph{IRI reference object resolution} and the object of the triple is decided by \emph{subject resolution}. If both \texttt{rel} and \texttt{rev} attributes are used within the same element, two RDF statements are generated. Literal object resolution yields either the value of the \textbf{content} attribute or, if it is absent, the element content. The value of the content attribute is by default interpreted as a plain literal. The element content is by default interpreted as an XML literal. The \textbf{datatype} attribute is used to specify a specific XML Schema datatype \\(cf. \texttt{http://www.w3.org/TR/xmlschema11-2/}). If it is present, the value yielded by literal object resolution is interpreted as an instance of the XML Schema datatype. IRI reference object resolution yields either the IRI value of the \texttt{resource} attribute or, if absent, the \texttt{href} or \texttt{src} attribute. The \texttt{resource} attribute is only used to specifically communicate that the IRI is not intended to be ``clickable'', or if a \texttt{href} or \texttt{src} attribute is already present on the element for other reasons and does not refer to the intended object. It is strongly advised to use the \texttt{href} attribute whenever reasonable. Subject resolution usually yields the IRI value of the \texttt{about} attribute, or, if the RDFa element that includes the predicate attribute does not have an \texttt{about} attribute, the \texttt{about} attribute of the first ancestor element that has an \texttt{about} attribute. In the absence of an \texttt{about} attribute within scope, it yields the \emph{xml base} of the element. Resolution of relative IRIs to the XML base in MetaLex is described in \autoref{sub:identification_and_relative_iris} \begin{textbox} Any MetaLex XML element \emph{may} be annotated using valid RDFa attributes if the content model allows it. \end{textbox} \begin{textbox} By design RDFa attributes are \emph{namespace independent}. This means that RDFa attributes are the \textbf{only} attributes occurring in conformant MetaLex XML documents that \emph{must not} have a namespace. \end{textbox} \begin{textbox} Elements derived from the \texttt{metalex:urMetaType} type \emph{must} contain RDFa attributes expressing an RDF statement. \end{textbox} \begin{textbox} A MetaLex document \emph{must} declare what it is a \textbf{manifestation} of, e.g.: \small \begin{verbatim} \end{verbatim} \normalsize \end{textbox} Refer to \autoref{sec:metadata} for a definition of \textbf{manifestation}. % \begin{itemize} % \item \texttt{about}: the subject (IRI reference to an entity) of an RDFa statement; % \item \texttt{property}: the predicate of an RDFa statement whose object is a literal value; % \item \texttt{rel}: the predicate of an RDFa statement whose object is a IRI reference to another entity; % \item \texttt{href}: the object IRI reference to another entity of an RDFa statement; % \item \texttt{content}: the literal value that is the object of an RDFa statement. % \end{itemize} % The only thing left is the '''showAs''': we need a label for both the predicate and the object at least. Labels are useful for applications that want to show the user a description of a relation and target that is understandable (and not a IRI or QName derived from a IRI). \begin{textbox} The optional metadata attribute \texttt{metalex:showAs} \emph{must} contain a human readable label of the \emph{whole} metadata statement, i.e. of its subject, property and object. \end{textbox} % subsubsection metadata_attributes (end) \subsubsection{Additional Considerations} % (fold) \label{ssub:additional_considerations} \begin{textbox} Attributes from any other (\texttt{\#other}) namespace \emph{may} be used on any MetaLex element. \end{textbox} \begin{textbox} The MetaLex \xmlschema and \dtd are the normative source on the use of attributes, and the values allowed. \end{textbox} % subsubsection additional_considerations (end) \subsection{Identification and Relative IRIs} % (fold) \label{sub:identification_and_relative_iris} \begin{textbox} Every MetaLex XML element \emph{must} declare an \texttt{id} attribute with a \textbf{unique} ID value to allow identification with a unique IRI. Use \texttt{metalex:id} if no \texttt{id} already exists. \end{textbox} \begin{note} Rinke Hoekstra: Just require the use of \texttt{metalex:id}. \end{note} As specified in \href{http://www.ietf.org/rfc/rfc2396.txt}{RFC 2396}, many applications and protocols make use of an abbreviated form of IRI to refer to resources. Such abbreviated forms, called relative references or relative IRIs, \emph{may} be used in a context where enough information exists to determine the corresponding full IRI by which the resource can be identified (a process called resolution). The context information is called the \textbf{base IRI}, and is made available in a number of application- and context-specific ways. \begin{textbox} It \emph{must} be possible to establish the \textbf{base} of an element, in conformance with the \href{http://www.w3.org/TR/xmlbase/}{XML Base} specification and \href{http://www.ietf.org/rfc/rfc2396.txt}{RFC 2396}. \end{textbox} \begin{textbox} The string concatenation of the values of \texttt{base+'\#'+id} \emph{must} result in a valid IRI, conformant to the \href{http://www.w3.org/Addressing/}{addressing recommendations} of the W3C. \end{textbox} The easiest way to achieve this requirement is to always add an \texttt{xml:base} attribute in scope of the element. The \texttt{xml:base} is in scope if it is on the element itself, or on one of its ancestors. \subsubsection{Resolution of IRIs} % (fold) \label{ssub:resolution_of_iris} \begin{textbox} MetaLex conformant applications \emph{must} be able to \textbf{retrieve} (and if needed show) the corresponding document, element or its content, given the IRI value of its \texttt{id} attribute. \end{textbox} \begin{textbox} The mechanism by which applications retrieve documents, elements or contents, given an IRI is \textbf{beyond the scope} of this specification. \end{textbox} \begin{note} For traditional hierarchic IRIs that take the form of URLs, the \textbf{dereferencing} of IRIs may follow the guidelines set out by the HTTP/1.1 protocol in \href{http://www.ietf.org/rfc/rfc2616.txt}{RFC 2616}. For IRIs that take the form of URNs, an alternative resolution mechanism is appropriate. \end{note} % subsubsection resolution_of_iris (end) \subsubsection{Signalling Compliance to a Naming Convention} % (fold) \label{ssub:compliance_to_a_naming_convention} The \texttt{metalex:naming} attribute (value: am IRI) \emph{may} be used to signal compliance to a conformant naming convention (see \autoref{sec:naming_convention}). % RH: I didn't understand this bit: % It \emph{should} be recoverable from the context of the element. The \texttt{metalex:LocalNamingConventionMethod} (values: individual, ordinal, positional) and \texttt{LocalNamingConventionScope} (values: parent, root, or name of ancestor element relative to which identifiers like \emph{article 1} are unique) attributes \emph{may} be used to signal conformance to a local naming convention for structural parts of a document. % They \textbf{must} be recoverable from the context of the element. % subsubsection compliance_to_a_naming_convention (end) % subsection identification (end) % section syntax (end) \section{Metadata and Semantics} % (fold) \label{sec:metadata}\label{sec:semantic-restrictions} This section specifies the means for adding metadata to concrete MetaLex conformant XML documents. It consists of three parts: a mechanism for \textbf{embedding metadata} in MetaLex conformant XML documents (\autoref{sub:embedded_metadata_processing}), a \textbf{schema} that defines the semantics of the standard vocabulary for expressing metadata (\autoref{sub:owl_schema}), which forms a guideline for the identification, manipulation and description of MetaLex XML documents as \textbf{bibliographic entities} (\autoref{sub:bibliographic_entities}). % See the RDF specifications at \url{http://www.w3.org/2001/sw/RDFCore/} for details on RDF literals and datatyping of literals. % An RDF statement has the following components: % % \begin{description} % \item[subject:] the thing the statement describes; % \item[predicate:] a specific property; % \item[object:] the thing the statement says is the value of the property, for the thing the statement describes. % \end{description} % As specified in \autoref{ssub:metadata_attributes}, MetaLex uses the conventions of \href{http://www.w3.org/TR/rdfa-syntax/}{RDFa} processing for embedding RDF metadata statements inside MetaLex XML. \subsection{Embedded Metadata} % (fold) \label{sub:embedded_metadata}\label{sub:embedded_metadata_processing} MetaLex allows the embedding of metadata by means of the attribute set defined in \autoref{ssub:metadata_attributes}. The purpose of MetaLex embedded metadata is to enable not just the storage of \href{http://www.w3.org/RDF/}{RDF} metadata \textbf{outside}, but also \textbf{inside} a MetaLex XML document. \begin{note} The RDFa statements inside a MetaLex XML document can be transformed to RDF using a standard \href{http://www.w3.org/TR/grddl/}{GRDDL} transformation. \end{note} A \href{http://www.w3.org/RDF/}{RDF} description of a resource consists of a set of statements of the form \texttt{(subject, predicate, object)}. The \textbf{subject} is the resource described by the statement, the \textbf{predicate} is the property used to relate subject to object, and the \textbf{object} is the value of the property as it holds for the subject. \begin{textbox} In case of ambiguity as to the syntax or semantics of RDF statements, this document defers to the respective RDF specifications at \url{http://www.w3.org/TR/rdf-syntax-grammar/} and \url{http://www.w3.org/TR/rdf-mt/}. \end{textbox} \begin{note} The main difference between storage inside and outside a standard XML manifestation is the identification of the metadata author. Metadata inside the document is associated to the editor of the manifestation, who can be assumed to be the author of the metadata. \end{note} \begin{textbox} MetaLex specifies no explicit mechanism for linking a MetaLex XML document to RDF metadata stored outside this document, other than through standard IRI dereferencing. \end{textbox} % subsection embedded_metadata (end) \subsection{OWL Schema} % (fold) \label{sub:owl_schema} % The OWL schema can be found at \texttt{http://www.metalex.eu/metalex/YYYY-MM-DD}, where \texttt{YYYY-MM-DD} is the date of the agreement, provided that one correctly sets the mime type, i.e: % % \begin{verbatim} % GET http://www.metalex.eu/metalex/YYYY-MM-DD % Accept: application/rdf+xml % \end{verbatim} % A description logic syntax (cf. \cite{horrocks03reducing,baader01overview}) rendering is found in appendix \ref{owl}. MetaLex includes an OWL schema that specifies properties and classes for use in \href{http://www.w3.org/RDF/}{RDF} statements about MetaLex XML documents. \begin{textbox} This schema \emph{should} be used for \href{http://www.w3.org/RDF/}{RDF} metadata, stored in or outside a concrete MetaLex XML document. \end{textbox} \begin{textbox} The embedded metadata processing mechanism \emph{may} be used in conjunction with other metadata formats. \end{textbox} The MetaLex standard provides metadata for describing both the \textbf{hierarchical} and \textbf{relational} way of positioning and identifying bibliographic objects, at least at the work and expression level, at the relevant levels of granularity. The OWL schema specifies the concepts mentioned in section \ref{biblio}, the identifying metadata specified in section \ref{self} and \ref{idmetadata}, and the citation metadata specified in section \ref{citation}, as well as some related auxiliary concepts. \subsubsection{Bibliographic Entities} % (fold) \label{sub:bibliographic_entities} \label{biblio} The use of bibliographic terminology in the CEN MetaLex standard is as follows:\footnote{Inspired by the IFLA Functional Requirements for Bibliographic Records (cf. \cite{Saur1998})} \begin{itemize} \item A \textbf{bibliographic object} is a bounded representation of a body of information, designed with the intent to communicate, preserved in a form independent of a sender or receiver. A bibliographic work, expression, manifestation, and item are bibliographic objects. \item A \textbf{bibliographic citation} is a representation of a bibliographic identifier of a bibliographic object, with the intent of referring to that bibliographic object. \emph{Article 1}, \emph{the first article} and \emph{the previous article} are examples of citation, and \emph{the Minister}, the \emph{President of the Republic}, \emph{the accused}, and \emph{We, Beatrix} are examples of references to other, interesting but non-bibliographic, things. \item A \textbf{unique bibliographic identifier} identifies a bibliographic object uniquely. The IRI is used as a unique bibliographic identifier in MetaLex. \item A \textbf{bibliographic work} is a bibliographic object, realized by one or more expressions, and created by one or more persons in a single creative process ending in a publication event. A work has an author or authors, and is the result of a publication event. We recognize the work through individual expressions of the work, but the work itself exists only in the commonality of \emph{content} between and among the various expressions of the work: it is an intentional object\footnote{I.e. it exists only as the object of one's thoughts and communication acts, and not as a physical object.}. \item An \textbf{bibliographic expression} is a realization of one bibliographic work in the form of signs, words, sentences, paragraphs, etc. by the author of that work. Physical form aspects, as typeface or page-layout, are generally speaking excluded from the expression level. Any change in \emph{content} constitutes a gives rise to a new expression. If an expression is revised or modified, the resulting expression is considered to be a new expression, no matter how minor the modification may be. Expression is an intentional object. \item A \textbf{bibliographic manifestation} embodies one expression of one bibliographic work. The boundaries between one manifestation and another are drawn on the basis of both content and physical form. When the production process involves changes in physical form the resulting product is considered a new manifestation. Thus, a specific XML representation, a PDF file (as generated by printing into PDF a specific Word file with a specific PDF distiller), a printed booklet, all represent different manifestations of the same expression of a work. Manifestation is an intentional object. A MetaLex XML document is a bibliographic manifestation. \item A \textbf{bibliographic item} exemplifies one manifestation of one expression of one work: a specific copy of a book on a specific shelf in a library, a file stored on a computer in a specific location, etc. Items stored on a computer can be easily copied to another location, resulting in another item, but the same manifestation. This makes adding metadata about the item to the item in principle impossible. On the Internet generally speaking only the \emph{uniform resource locator} (URL) is an item-specific datum. An item is a physical object. \end{itemize} The relations between the bibliographic entities is illustrated in \autoref{fig:bibliographic_entities}. \begin{figure} \begin{center} \includegraphics[width=8cm]{bibliographic_entity.pdf} \caption{Relations between bibliographic entities in MetaLex}\label{fig:bibliographic_entities} \end{center} \end{figure} \begin{textbox} A MetaLex XML document is a \textbf{standard manifestation} of a \textbf{bibliographic expression} of a source of law. Editing the MetaLex XML markup and metadata of the XML document \textbf{changes the manifestation} of an expression. Changing the marked up text \textbf{changes the expression} embodied by the manifestation. Copying an example of the MetaLex XML document \textbf{creates a new item}. \end{textbox} \begin{textbox} Embedded metadata \emph{must} explicitly indicate its subject. \end{textbox} If it is the document self-identifier (the empty string) \texttt{""} or a fragment identifier \texttt{"\#f"}composed of the document self-identifier and the value \texttt{f} of the \texttt{id} attribute of the intended target element, then the metadatum refers to the \textbf{item}. The only \textbf{item-level} metadata property is the \texttt{metalex-owl:exemplifies} property, which takes the \textbf{manifestation-level} identifier as value (cf. \autoref{sub:identification}, also \autoref{ssub:metadata_attributes}). The required type of subject of each metadata property is explicitly constrained in the OWL schema. \begin{textbox} XML attribute values by default \textbf{pertain to the expression} embodied by the manifestation, unless explicitly specified otherwise in this document. \end{textbox} \begin{textbox} Each bibliographic item \textbf{exemplifies} exactly one manifestion that \textbf{embodies} exactly one expression that \textbf{realizes} exactly one work. \end{textbox} Because all these mappings are \emph{functional}, i.e. unambiguously maps to one entity, item identity can be, and often is, used as an indirect identifier of the other objects. This is similar to how, email addresses have a functional mapping to persons and can be used as an indirect identifier of persons. One can refer to a work by referring to its initial expression in a context where a reference to a work is expected. The inverse of these relations is often not a function. One can think of the work as an abstraction of one or many expressions, the expression as an abstraction of one or many manifestations, the manifestation as an abstraction of one or many items. The manifestation, expression, and work are intentional objects whose existence is conditioned to the existence of at least one item, manifestation, expression, respectively. There is normally speaking no such thing as an expression that is not embodied, a work that is not realized, etc. % subsubsection bibliographic_entities (end) \subsubsection{Events} % (fold) \label{ssub:events} Besides the hierarchical constitutive relationships between the four levels, there are also \textbf{horizontal relations} between the objects within a level. The expressions of a work in the legal field are usually either the initially published expression, or expressions derived by content modification activity or translation activity. Manifestations of an expression are either the initially created one(s), or manifestations derived by editing activity. Items of a manifestation are either the initially created one(s), or copies of them. \begin{note} Appropriate manifestation level version management methods and tools already exist (CVS, SVN, GIT etc.). These are of no concern to the CEN MetaLex workshop. \end{note} Most legislative events happen at the work and expression levels. Content-related events like markup, metadating, and digital signature happen at the manifestation level. The MetaLex OWL schema includes a number of event type definitions (cf. \cite{BoerEtAl2004}, and generally \cite{LagozeEtAl2000} on linking metadata to events) to make these horizontal relations explicit for the expression level. At the item level they cannot be embedded in a MetaLex item for obvious reasons. The basic categories are: \textbf{event}, \textbf{action} and \textbf{transaction}. \paragraph{Thematic Roles} % (fold) \label{par:thematic_roles} Events, actions and transactions have participants in certain roles. MetaLex uses a classification in terms of \textbf{thematic role}. The thematic role is the semantic relationship between a verb and an argument (the noun phrases) of a sentence. \begin{note} It is important to realize that thematic roles are based on linguistic criteria, and do not offer an ontologically sound criterium for classifying entities. The classification is therefore somewhat arbitrary, but easy to remember and use. \end{note} MetaLex uses a simple categorization of thematic roles. Each occurrent has one or more participants (properties), that are either: \begin{description} \item[Immanent or Determinant] a determinant participant determines direction, while an immanent participant is passively present throughout. \item[Source or Product] a source must be present at the beginning, but need not participate throughout, while a product must be present at the end but need not participate throughout. \end{description} MetaLex defines the following participants: \begin{description} \item[agent] is \textbf{determinant} and \textbf{source} of the action; a person or some organised group of persons; only actions have an agent; \item[instrument] is \textbf{immanent} and \textbf{source} of the action, and is not changed during the action; \item[patient] is \textbf{immanent} and \textbf{product} of the action, and undergoes some structural change as a result of the action; at the level of bibliographic entities this applies to the work; \item[recipient] is \textbf{determinant} and \textbf{product} of the action: the person towards whom the action was directed; in the case of sources of law this is usually the addressee; only transactions have a recipient; \item[result] is \textbf{determinant} and \textbf{product} of the action: a thing that was created by the action; at the level of bibliographic entities this can apply only to the expression; \item[date] is \textbf{immanent} and \textbf{product} of the action: when it happened, which is in the domain of legislation always a date. \end{description} % paragraph thematic_roles (end) \paragraph{Actions} % (fold) \label{par:actions} MetaLex covers three generic actions: \begin{description} \item[Creation] A bibliographic entity (result) is created by an author (agent), at a date. It is not relevant whether the text is a verbatim copy or a modification on an earlier text by another author: the identity of bibliographic entities does not depend on its content. When a bibliographic entity is created, its parts are also created, but the parts can be independently modified, resulting in a new creation. The expression of which an element is the manifestation cannot be created before a containing expression. \item[Enactment] The action of an agent with the competence (instrument) to enact by which an expression enters into force. The trigger be the more or less autonomous execution of an enactment provision (instrument) created by the agent before. The agent responsible for the enactment provision can still be considered to be acting. \item[Repeal] The action of an agent with the competence (instrument) to repeal by which an expression goes out of force. May be the more or less autonomous execution of an repeal provision (instrument) created by the agent. \end{description} % paragraph actions (end) % subsubsection events (end) \subsubsection{Additional Metadata} % (fold) \label{ssub:additional_metadata} Additional terminology is introduced to capture bibliographic phenomena relevant to law, as well as operational criteria for recognizing these in a MetaLex XML document. These classifications should be considered metadata, and their specifications are found in the MetaLex OWL schema: \begin{itemize} \item A \textbf{bibliographic source of law} is a bibliographic object that can be, is, was, or presumably will be referred to, by way of bibliographic citation, to back an argument claiming the existence of a legal rule in a certain legal system, or, alternatively, a bibliographic object published or realized by a competent legislator to communicate a legal rule to a certain group of addressees. Both the legislator and the user of the bibliographic source of law understand it as a medium used for communicating the existence of legal rules, including auxiliary declarations required for the proper understanding of legal rules, between legislator and user. \item An \textbf{initial version} of a bibliographic work is the expression that realizes the work at the time of its official release in the public domain as a bibliographic object. It is the \texttt{metalex-owl:result} of an \texttt{metalex-owl:\\InitialBibliographicCreation}. \item A \textbf{version} of a bibliographic work is either the initial version of the work, or an expression realized by modification of a version. It is the \texttt{metalex-owl:result} of a \texttt{metalex-owl:BibliographicModification} of the (\texttt{metalex-owl:matter}) previous version. \item A \textbf{version in force} of a bibliographic work is a version that is, was, or will be \emph{in force} dIRIng a specific time interval. The in force time intervals of versions in force of the same work do not appear to overlap in time \emph{when viewed from any specific vantage point in time}. Note however the possibility of \emph{virtual} and \emph{ex tunc} expressions, which may create a substantial difference between vantage points in time. \item A \textbf{version ex tunc} of a bibliographic work is an expression that reverberates back in time up to a specific date the effects in terms of content, force and efficacy of a constitutive event on a specific version of the work. The version ex tunc is thus, \emph{as it were}, generated by a fictional event past in time that allows rewriting institutional history, for instance through errata corrige, or at the occasion of an annulment of a modification made in the past by a constitutional court. The fictional event of such an expression is treated \emph{as if} it were a true event \emph{after} the constitutive event happened. Institutional history therefore appears different depending on which vantage point in time one takes. A version ex tunc is the \texttt{metalex-owl:result} of a \texttt{metalex-owl:\\FictionalExpressionCreation}, which is a temporal fiction \\(\texttt{metalex-owl:temporalFictionOf}) of some other event -- the constitutive event -- that happened \emph{after} the fictional event. \item A \textbf{consolidation} of a bibliographic source of law is a version realized by the execution of legal rules found in another bibliographic source of law to the previous version. It is the \texttt{metalex-owl:result} of a \texttt{metalex-owl:\\LegislativeModification} of the (\texttt{metalex-owl:matter}), which is the previous version, by the (\texttt{metalex-owl:instrument}), which is the other bibliographic source of law stipulating the modification. This is often when it enters into force, although the modification may of course be conditional upon some other event. \item A \textbf{variant} of a version in force of a bibliographic work is an expression that shares its \emph{in force} time interval. Although the concept variant is often nominalized, it is a symmetric relationship (\texttt{metalex-owl:variant}) between two expressions. \item A \textbf{language variant} of an expression of a bibliographic work is an expression that shares its in force time interval, and differs in nothing but language. For instance, the English, Dutch, Italian, and German versions of a European directive are different language variants. It is also a symmetric relationship (derived on \texttt{metalex-owl:variant}). In a MetaLex manifestation of the expression this is expressed in different values of the \textbf{xml:lang} attribute. \item A \textbf{translation} of an expression of a bibliographic work is an expression that shares its in force time interval, differs in nothing but language, and has been realized by way of translation of one expression into another expression. Translation is an asymmetric relation between bibliographic expressions, expressed by a \texttt{metalex-owl:Translation} event, which has a\\ \texttt{metalex-owl:translator}, taking the initial expression as a \\\texttt{metalex-owl:matter}, and the translated document as \texttt{metalex-owl:\\result}. The translation should not be confused with language variant: while language variants can be realized concurrently by the legislator, and are equally authoritive if they are, the translation of an expression is generally speaking less authoritive than the expression it is a translation of, even if \emph{officially} translated. \end{itemize} % subsubsection additional_metadata (end) % subsection owl_schema (end) \subsection{Identification and Resolution of Metadata} % (fold) \label{sub:identification} \begin{textbox} Each bibliographic item encoded in conformance to this standard \emph{must} have at least one IRI as defined by \rfciri by which MetaLex conformant applications \emph{must} be able to \textbf{retrieve} (and if needed show) the identifying metadata in RDF. \end{textbox} This means that manifestation, expression and work elements have IRIs, and they are therefore referred to, respectively, as item-level IRI, manifestation-level IRI, expression-level IRI and work-level IRI. It is in principle not possible to encode item level information in the manifestation, and as such the item itself cannot provide information about its whereabouts. \begin{textbox} The mechanism by which applications retrieve relevant metadata, given an IRI is \textbf{beyond the scope} of this specification. \end{textbox} \begin{note} For traditional hierarchic IRIs that take the form of URLs, the \textbf{dereferencing} of IRIs may follow the guidelines set out by the HTTP/1.1 protocol in \href{http://www.ietf.org/rfc/rfc2616.txt}{RFC 2616}, and the additional guidelines of the SWBP in their \href{Best Practices for Publishing RDF Vocabularies}{http://www.w3.org/TR/swbp-vocab-pub/}. For IRIs that take the form of URNs, an alternative resolution mechanism is appropriate. \end{note} \begin{textbox} If metadata is not available as either embedded RDFa or external RDF, MetaLex conformant applications \emph{should} implement a mechanism for retrieving relevant content from other sources (e.g. PDF, Word, HTML Meta tags etc), or refer to such an implementation. \end{textbox} For cases where bibliographic items are themselves not explicitly encoded (i.e. they do not have an IRI), but relevant metadata may be retrievable if it were, the guidelines of \autoref{ssub:naming_convention} apply. \subsubsection{Use of Relative IRIs for Bibliographic Entities} % (fold) \label{ssub:use_of_relative_iris_for_bibliographic_entities} In principle only full IRI are used to identify resources. Relative reference rely on the base IRI to generate the missing information. In accordance with \autoref{sub:identification_and_relative_iris} MetaLex allows the use of \textbf{relative} IRIs to identify bibliographic entities, as long as any legal resolution of the relative IRI generates a reference to the same work, expression, manifestation, item, respectively. This is possible if the resolution of a reference is \textbf{independent} of the base IRI used. \begin{textbox} It \emph{must} be possible to establish the base IRI of any item-, manifestation-, expression- and work-level IRI, in conformance with the \href{http://www.w3.org/TR/xmlbase/}{XML Base} specification and \href{http://www.ietf.org/rfc/rfc3987.txt}{RFC 2396}. . \end{textbox} % This is true also for fragment identifiers: the concatenation of the established IRI base and the id attribute of an element must result in a valid IRI reference for the element (at the same level as the base IRI), conformant to the addressing recommendations of W3C, which counts as a bibliographic identifier of the element as a bibliographic item. \begin{note} The explicit use of an \texttt{xml:base} attribute means that all \textbf{item-level} references remains the same after every copy operation of the document, which means that it also behaves as \textbf{manifestation-level} base. There are legitimate use cases of the \texttt{xml:base} attribute, where it is inserted as a temporary identifier to an XML subtree in an XML processing pipeline. \end{note} % subsubsection use_of_relative_iris_for_bibliographic_entities (end) \subsubsection{Self-identification with Metadata} % (fold) \label{ssub:self_identification_with_metadata}\label{self} \begin{textbox} For any manifestation, a manifestation-IRI \textbf{must} be declared by means of \texttt{metalex-owl:exemplifies}. \end{textbox} \begin{textbox} The expression and the work \emph{must} be declared in case no conformant a \textbf{naming convention} is applicable (see \autoref{naming}) . \end{textbox} \begin{note} the IRI reference \texttt{about=""} refers to the document itself. The item cannot identify itself, other than with this relative reference. \end{note} The following declares a standard manifestation, expression, and work base (using the naming convention): \small\begin{verbatim} \end{verbatim}\normalsize The RDF reading of \textbf{m1} is as follows: \textbf{m1} is a statement that states that the (referent of) \texttt{metalex:exemplifies} of (the referent of) \texttt{(empty string)} is (the referent of) \texttt{/tv/act/2004-02-13/2/tv}. An alternative, simpler form that is also permitted is the following: \small\begin{verbatim} \end{verbatim}\normalsize The semantics of \texttt{metalex-owl:thisEmbodies} and \texttt{metalex-owl:thisRealizes} are as follows, assuming \texttt{\{p, q, r, s\}} is a set of IRI references\footnote{This is an instance of property chaining}: \begin{enumerate} \item If \texttt{p metalex-owl:exemplifies q} and \texttt{q metalex-owl:embodies r} \\then \texttt{p metalex-owl:thisEmbodies r} \item If \texttt{p metalex-owl:exemplifies q} and \texttt{q metalex-owl:embodies r} and \\\texttt{r metalex-owl:realizes s} then \texttt{p metalex-owl:thisRealizes s}. \end{enumerate} Read for \texttt{meta} in the examples above any appropriate element that permits metadata attributes. The IRIs are relative, in this case conforming to a \textbf{naming convention}: the base is set by the processing environment. % This means that the \emph{mIRI} of the naming convention describes a IRI \emph{reference} that potentially resolves to large set of IRIs for each bibliographic object: one for each processing environment that sets its own base. Note that if one wants to conform to the naming convention but also want to embed another identifier, this is trivially achieved with a metadata statement of that impact, for instance: \small\begin{verbatim} \end{verbatim}\normalsize % subsubsection self_identification_with_metadata (end) \subsubsection{Example} % \marginpar{This doesn't quite work as ``just an example''. The use of classes and attributes refers to the ontology. Although the ontology does not come with existence constraints, there should be a requirement that you use (i.e. mostly subclass) what is in it it unless you have an argued reason to deviate from it.} \label{idmetadata}\label{sub:identifying_metadata}\label{sub:example_metadata} As set out in \autoref{sub:identification}, for any concrete MetaLex XML document, a set of metadata \textbf{must} be available. This section provides an example of how a minimal set of metadata can be specified as RDFa statements on the \texttt{metalex:meta} element. \paragraph{Work} % (fold) \label{par:work} Consider \texttt{[ManifIRI]} to be the identifier of the manifestation, \texttt{[ExpIRI]} the identifier of the Expression, \texttt{[WorkIRI]} the identifier of the work, all declared as in the relevant naming convention. Typed values \texttt{}, etc. are values conforming to the XML Schema datatype \texttt{xsd:date}, etc. A relevant set of metadata declarations for a work-level identification could thus be the following: \small\begin{verbatim} \end{verbatim}\normalsize The main purpose of the example above is to create a non-ambiguous sequence of work creation events for each work document type. \begin{note} Since metadata is specified as RDF statements, all document types, including works, \emph{may} be identified by an additional set of resources that describe it, e.g. by means of terms from a vocabulary specification in SKOS. This description can be used to type the work, as in \texttt{:act} for act of parliament, or to describe multiple aspects of the work, \texttt{:municipal}, \texttt{:foobartown}, \texttt{:decree}. \end{note} It is recommendable to identify work document types with specific publication channels that can be regularly monitored. Document types are specific to countries. The combination of the date of the event and the issue number (some string) uniquely positions the work in the temporal sequence of works belonging to that document type. % paragraph work (end) \paragraph{Expression} % (fold) \label{par:expression} A relevant set of metadata declarations for an expression-level identification could be the following: \small\begin{verbatim} \end{verbatim}\normalsize The \texttt{[CreationEventDateIRI]} \textbf{may} and will often coincide with the creation of the work in non-versioned document types. In case of an ex tunc modification the creation event is a \\\texttt{FictionalExpressionCreation}, and it is a temporal fiction of some other event, to be filled in, which happened at another relevant date: \small\begin{verbatim} \end{verbatim}\normalsize % paragraph expression (end) \paragraph{Manifestation} % (fold) \label{par:manifestation} A relevant metadata declaration for a manifestation-level identification could be the following: \small\begin{verbatim} \end{verbatim}\normalsize This metadata declaration \textbf{may} be encoded in the XML manifestation as metadata, but it is assumed that extraction of RDF metadata from a standard MetaLex manifestation already provides the information that the file type is XML. The OWL schema provides more identifying metadata that \textbf{may} be added. % paragraph manifestation (end) % subsection identification (end) % \subsection{Uniform Resource Identifiers} % \subsection{Fragment Identifiers} % % Every MetaLex element must have an \texttt{id} attribute, even though not necessarily the \texttt{metalex:id} attribute. The \texttt{id} value of an element \emph{may} be a fragment identifier. Concatenated to the manifestation level IRI it globally identifies the element at the manifestation level, concatenated to the expression level IRI it globally identifies the expression embodied by the content of the element, concatenated to the work level document IRI it presumably globally identifies a structural element common to various expressions of the work. % section metadata (end) \section{Addressing} % (fold) \label{sec:addressing} The MetaLex standard distinguishes three forms of addressing: \begin{enumerate} \item Reference to \textbf{external objects} (agents, events, etc): these are always encoded as metadata, and if embedded, always using the \texttt{href} or \texttt{resource} attribute. \item Citation of \textbf{other bibliographic objects}: this can be achieved through the citations attribute group, or through metadata, always using the \textbf{href} attribute. The target \emph{must} either be identified by use of identifying metadata, or if no explicit metadata assertions are available, it \emph{should} be discoverable by means of a conforming \textbf{naming convention}. In some cases the citation can be embedded in both attribute form and embedded metadata, because of the double interpretation of \texttt{href}.\footnote{\textbf{RH}: this is unclear.} \item Manifestation \textbf{component inclusion}: this can be achieved through the \texttt{src} attribute and additional metadata, or conformance to a naming convention. \end{enumerate} \begin{textbox} Addressing \emph{must} be done by means of an IRI, and \emph{must} made by using the \texttt{href} or \texttt{src} attribute if the addressing is stated as \textbf{embedded} metadata. \end{textbox} Because these attributes are also RDFa attributes the referring element also encodes a metadatum if an \texttt{about} and \texttt{rel} attribute are present. A \texttt{citation} is an expression that refers to something \textbf{intra linguistic}, i.e. to another XML element (directly or indirectly) containing text, or to the \textbf{bibliographic objects} directly or indirectly embodied by it. Other references refer to something extralinguistic, i.e. something other than text, recoverable from the context in which the document was produced. \begin{example} \emph{Article 1, the first article} and \emph{the previous article} are examples of citation, and \emph{the Minister, the President of the Republic, the accused}, and \emph{We, Beatrix, etc.} are examples of relevant references to other things. \end{example} The distinguishing property of references is that the value of the \texttt{rel} attribute is a subproperty of \texttt{metalex-owl:refersTo}. Citation uses a property value that is a subproperty of \texttt{metalex:cites}, which is itself a subproperty of \texttt{metalex-owl:refersTo}. \begin{textbox} The fact that some MetaLex XML element refers to another entity \emph{may} also be stored in RDF, external to the MetaLex XML document. \end{textbox} \subsection{Reference} A reference is something that refers to or designates something else, or acts as a stand-in for a relation between two things: the \emph{referrer} and the \emph{referent}. Since a relation can also be identified, the generic form of a reference is \emph{(referrer, predicate, referent)}, where predicate is the name of the relation, and are represented as RDF or RDFa. In the sense intended here a reference is an XML element (directly or indirectly) containing text, and the text refers or is deemed to refer to something else. The XML element is typically of the \texttt{inline} content model type. \begin{textbox} The use of references is optional. \end{textbox} \begin{textbox} The \texttt{src} attribute \textbf{may not} be used. \end{textbox} %% Rinke Hoekstra %% Removed this bit, as it does not correspond to anything in the MetaLex schemata %% % A similar distinction, is the distinction between \textbf{exophora} and \textbf{endophora} in linguistics. % % \begin{example} % Take the following sentences: % % \begin{enumerate} % \item \emph{Theft} is the unlawful taking of a good wholly or partially belonging to another. % \item \emph{It} (\emph{The theft}) must have been done with the intent to appropriate. % \end{enumerate} % % \emph{It} (or \emph{The theft}) obviously refers back to Theft in the previous sentence. It is an \textbf{endophoric} proform expression, as opposed to \textbf{exophoric} (That must have been done intentionally, pointing to an act of taking in progress). Exophoric proform expressions are obviously rare in legal resources. % % There is however a difference between the reference \emph{It} referring to theft as defined in the previous sentence and \emph{the previous sentence} as referring to a sentence. Theft and it are coreferents of something other than text: it is not an intra linguistic reference. Expressions like it do stand in for another expression (Theft), but only to indirectly reference meaning recoverable from context. They are used to avoid repetitive expressions and in quantification (i.e. carrying a variable from one sentence into the next one). % \end{example} \begin{example} The following is an example of a reference metadatum in RDFa: \begin{verbatim} \end{verbatim} \end{example} \subsection{Citation} \label{citation}\label{sec:citations} \subsubsection{Single Target} % (fold) \label{ssub:single_target} A citation with a \textbf{single} target (e.g. \emph{article 1 of the Income Tax Law or article 15.3}) straightforward. The entire referring text can be marked, and is linked to the one concept that is being referred to. This can be either a \textbf{work} or an \textbf{expression}, depending on whether or not version information is specified or hinted at. \begin{textbox} A citation in legislation normally refers to a work. \end{textbox} \begin{textbox} A reference to legislation in a court verdict necessarily refers to an expression. \end{textbox} References in other documents can refer to any level. % subsubsection single_target (end) \subsubsection{Multiple Targets} % (fold) \label{ssub:multiple_targets} References with \textbf{multiple} targets are more complex to mark, as we now have several entities to link to (i.e. \emph{article 5 and 6 of the Income Tax Law}). An important problem is where to anchor these links. In addition there are references to \textbf{ranges}, such as \emph{articles 5-10 of the Income Tax Law}. This reference will have \textbf{several targets}, and without studying the target document we cannot determine the exact target locations. It might be \emph{{article 5, article 6, article 7, article 8, article 9, article 10}} or \emph{{article 5, article 6, article 7, article 7a, article 7b, article 8, article 10}} or even \emph{{article 5, article 10}}. The exact targets cannot be included in the document, as they cannot be derived from the information present. In addition, the \textbf{targets may change} depending on the version being referred to in case of references to \textbf{works}. \paragraph{Exceptions} % (fold) \label{par:exceptions} Ranges get even more complex when \textbf{exceptions} are involved, such as \emph{articles 5-10 with the exception of article 9}. An interesting variation occurs when an exception is made to something other than a range: \emph{article 5 with the exception of the first member}. This implicitely turns article 5 into a range of \emph{member 1, [..] member n}. % paragraph exceptions (end) \paragraph{Each Time} % (fold) \label{par:macros} A final format for references is the \textbf{``each time''}: \emph{articles 10, 12, 15 and 16, each time the first member}. First member by itself cannot be a reference to something here. The links \textbf{may not} be to the articles but to the first member of each article. \begin{note} The advantage of special markup for exception and each time constructs is that it is a hint to (less refined) parsers that discover and markup references that this one has already been discovered, interpreted, and discarded. \end{note} % paragraph macros (end) \paragraph{Problems} % (fold) \label{par:problems} Each of these complications show the same pattern: there is a difference between the entities explicitly cited in the text, and the ones it refers to. In some cases an element must be explicitly cited to convey the information that it is \emph{not} being referred to! Document authors dealing with complex references need to take two issues into account: \begin{enumerate} \item There are \textbf{multiple reasonable ways} to mark up the same statement. \item The encoding of a metadata about a complex reference will necessarily require \textbf{more than one metadata statement}, and therefore need more than one carrier element if RDFa is used. \end{enumerate} \begin{note} These issues indicate corresponding problems for document translation. Some XML standards for instance uses a \emph{start} and \emph{end} range attribute on an element enclosing both start and end of the range. Other standards have two \emph{separate} elements marking the start and end, each using the same \texttt{href} (or similar) attribute. The second issue is a problem when the aim is to declare existing XML documents as MetaLex compliant documents without adding new XML elements. \end{note} % paragraph problems (end) % subsubsection multiple_targets (end) \subsubsection{Citation Syntax} % (fold) \label{ssub:citation_syntax} MetaLex provides two methods for describing citations. \begin{example} The following is an example of a citation \emph{metadatum} in RDFa: \begin{verbatim} \end{verbatim} \end{example} Other available properties (\texttt{metalex-owl:citeFrom}, \texttt{metalex-owl:citeUpto}, \texttt{metalex-owl:excluding}, \texttt{metalex-owl:excludeFrom}, \texttt{metalex-owl:\\excludeUpto}) can be used for complex references if the information is stored in RDF or if RDFa is strongly preferred and modifying the XML document is no problem. \begin{textbox} The \texttt{href} attribute \emph{must} be used. \end{textbox} Alternatively the \texttt{metalex:citations} attribute group can be used, if adding elements to the original XML file is a problem but embedding the references inside the document is nevertheless strongly preferred, with the attributes \texttt{href}, \texttt{metalex:upto}, \texttt{metalex:exclude}, \texttt{metalex:excludeTo} to specify two types of ranges from \texttt{href} to \texttt{metalex:upTo}, and from \texttt{metalex:excludes} to \texttt{metalex:excludesTo}. \begin{textbox} The target of the citation \emph{must} be identified in accordance with the principles set down in section \ref{idmetadata}. \end{textbox} \begin{textbox} The use of citations is optional. \end{textbox} % subsubsection citation_syntax (end) \subsection{Component Inclusion} \label{inclusion} Some XML manifestations may include components by \emph{reference} instead of by physical inclusion as a fragment, i.e. the inclusion reference replaces a part of the MetaLex manifestation stored in an external object identified by an IRI. % Section \ref{scheme} provides some additional information. \begin{textbox} Any part, except the top level container, of a standard MetaLex XML manifestation \emph{may} be implemented as a \textbf{inclusion reference} to an external object. \end{textbox} \begin{example} On the manifestation level you make choices about, for instance, object names and media formats (tiff, jpeg, pdf, etc.). In some cases a text that is (or could be) embodied as a XML manifestation (e.g., a chinese appendix of a treaty) is embodied alternatively by a media object. \end{example} For this the \texttt{metalex:srcatt} attribute group is provided, which defines the \\\texttt{metalex:src} attribute. \begin{textbox} Presence of the \texttt{metalex:src} attribute includes a component of the MetaLex manifestation stored in an external object identified by the IRI that is the value of the \texttt{metalex:src} attribute. \end{textbox} While not technically necessary, the existence of components can also be straightforwardly described in the form of RDFa metadata by adding the property \\\texttt{metalex-owl:component} in the \texttt{rel} attribute. \begin{example} The following is an example of a component inclusion metadatum in RDFa, superimposed on the \texttt{metalex:src} attribute: \begin{verbatim} \end{verbatim} \end{example} \begin{textbox} The target of the inclusion \emph{must} be identified in accordance with the principles set down in section \ref{idmetadata}. \end{textbox} \begin{textbox} The \texttt{metalex:src} attribute \emph{must} be used if the \texttt{metalex-owl:component} property is used. \end{textbox} % section addressing (end) \section{Naming Convention} % (fold) \label{sec:naming_convention}\label{naming} Identification of bibliographic entities plays a role in: \begin{enumerate} \item Self-identification of documents; \item Citation of other documents; \item Inclusion of document components. \end{enumerate} The identification of bibliographic entities is based on a matching operation between a list of feature values characterizing the request and the list of feature values characterizing the bibliographic entities. Given the specifications in sections \ref{conformance} and \ref{features}, there are two different methods for uncovering the relationship between \textbf{manifestation}, \textbf{expression}, and \textbf{work} features, which have a direct impact on naming and addressing: \begin{enumerate} \item An \textbf{explicit encoding} of the relation between these IRIs, and/or the class of the object designated by the IRI, as metadata (specified in \autoref{ssub:self_identification_with_metadata}) \item A \textbf{naming convention} that establishes a systematic relationship between transparent IRIs, which allows one to \textbf{derive} the IRI of one from the other. \end{enumerate} As described in \autoref{sec:metadata}, MetaLex by default uses the IRI of a bibliographic entity to retrieve relevant metadata. In this case the request simply includes the IRI of the entity. However, in some cases the IRI of that entity in may not be available. \begin{textbox} In accordance with \autoref{ssub:naming_convention}, implementers of MetaLex conformant applications \emph{may} decide to adhere to a MetaLex conformant \textbf{naming convention} that allows them to use a \textbf{mechanism} for constructing the requested IRI on the basis of a predefined set of feature values. \end{textbox} In some cases, the identity and content of relevant metadata is available, but not directly as RDF statements (e.g. in PDF or Microsoft Word). \begin{note} Where the IRI is available in a form other than RDFa or RDF, the guidelines of \autoref{sub:identification} apply. \end{note} \begin{note} It is the opinion of the workshop that it would be a bonus if MetaLex compliant implementations will be able to access and make proper use of both these mechanisms -- i.e. explicit identification and discovery based on a naming convention -- regardless of the preferred or local approach. Pending a specification on the definition of compliant \textbf{naming mechanisms}, no generic requirement holds. The workshop intends to decide this issue at a later stage. \end{note} \begin{note} In this section, the terms `name' and `identifier' are used interchangeably. \end{note} % In the first case, a compliant CEN Metalex implementation defines a naming convention, whereby the feature values are used to compose a string that is compatible with the IRI or IRI reference syntax as specified in IETF RFC 3986. \begin{textbox} A \textbf{naming convention} \emph{must} be compliant with the requirements set out in this section, and \emph{must} describe a subset of the identification features listed in \autoref{sub:identifying_metadata} sufficient to allow for the correct resolution IRIs.\footnote{\textbf{RH:} It is odd to put a normative requirement on an example section.} \end{textbox} \begin{example} Implementers may look at the Akoma Ntoso naming convention for inspiration.\footnote{See \url{http://www.akomantoso.org/}.} At the time of writing of this workshop agreement, this standard implemented a MetaLex compliant naming convention. \end{example} % In all other cases, a syntax-dependent mechanism based on qualified names must be used to extract the relevant feature values and used for identification and access purposes. \marginpar{Support for pdf, txt, etc??} Regardless of the approach used to specify the feature values, MetaLex requires an implementation to be able to resolve and identify the physical URL of the appropriate bibliographic entity, as per \autoref{ssub:resolution_of_iris}. \subsection{Naming Mechanism} \label{conformance}\label{sub:naming_mechanism} Any naming convention for bibliographic entities that is based on IRIs or IRI references according to \rfciri and that uses ids to identify document fragments can be used in a CEN Metalex environment as long as the following fundamental requirements are met: \begin{enumerate} \item The first three FRBR levels \emph{must} be explicitly supported by the naming convention: \textbf{works}, \textbf{expressions} and \textbf{manifestations} \emph{must} all have names and they \emph{must} be different. \item No requirement such requirements hold for \textbf{items}, except for the existence of a \textbf{mapping mechanism} that provides the IRI of an item (any of the existing ones) given an IRI of another level and sufficient context information to disambiguate it. \item The naming convention \emph{must} explicitly take into consideration the complex structure of a document, and the interrelation between components (e.g., between the main body of a document and its attachments, and the attachments' attachments). \end{enumerate} \subsubsection{Requirements} % (fold) \label{ssub:requirements} Furthermore, to allow for the automatic discovery of identifiers, names \emph{must} be: \begin{description} \item[Persistent] names at all levels \emph{must} maintain the same form over time regardless of the political, archival and technical events happened since their first generation; \item[Global] all relevant documents by all relevant bodies \emph{must} be represented; \item[Memorizable] names \emph{should} be easy to write down, easy to remember, easy to correct if they were written down wrongly; \item[Meaningful] names \emph{should} mean something; It should be possible to make assumption about the kind, freshness and relevance of a citation by looking only at the document's name; \item[Guessable across levels] references to different levels of the same document \emph{must} be similar; e.g., given a reference to an expression a user should be able to deduce the name of the work; \item[Guessable across document classes] references to different instances of the same document type \emph{must} be similar; \begin{example} Given a work-level reference to act 136/05, a user should be able to deduce the work-level name of act 76/06, etc. \end{example} \item[Guessable across document components] references to different components of the same document at the same level \emph{must} be similar; and \begin{example} Given an expression-level reference to attachment A of act 136/2005, a user should be able to deduce the expression-level name of attachment B of the same act. \end{example} \item[Hierarchical] names should use IRI hierarchies whenever possible and appropriate; in particular, hierarchies should be used at least to separate the FRBR levels and document components. \end{description} % subsubsection requirements (end) \subsection{Identification Features} \label{features}\label{sub:identification_features} Naming conventions \emph{may} use a number of document features for the purpose of creating the IRI to identify the bibliographic entity (at any level appropriate). Each of these feature may contribute to a fragment of the overall IRI, whose presence is instrumental to uniquely identify the relevant bibliographic entity. Depending on the naming convention, document type and syntax used, only some of these features may be known or relevant for identification purposes. \begin{textbox} No features of a lower level \emph{should} ever be used to identify higher levels of the bibliographic entity. \end{textbox} \subsubsection{Signalling a Naming Convention} \begin{textbox} Any naming convention \emph{must} make explicit its \textbf{syntax}, its \textbf{identity}, and its \textbf{protocol} in any IRI used. \end{textbox} The identification of these features allows tools to identify the convention as such and to activate the appropriate conversion routes -- if appropriate. \subsubsection{Work-level Features} Work level features are used to characterize the identity of the bibliographic entity at the work level, thus regardless of versions, dates and lifecycle events, as well as data format or physical location. \begin{enumerate} \item The \textbf{country} emanating the document; \item The \textbf{document type}; \item Any specification of \textbf{document subtype}, if appropriate; \item The \textbf{emanating actor}; \item The \textbf{promulgating actor}, who may be implicitly deducible either by the document type or by the emanating organization; \item Any relevant \textbf{creation date} of the work; \item Any relevant \textbf{number} or \textbf{disambiguating feature} of the work (possibly including titles). \end{enumerate} \subsubsection{Expression-level Features} Expression level features are used to characterize any specific selection of content across many that exist for the corresponding work. These are relevant regardless of data format, or physical location. \begin{enumerate} \item The \textbf{language}(s) associated (could be multiple) \item The \textbf{validity date}(s) associated to actual content (could be multiple) \item Any content \textbf{authoring information} to determine the authoritativeness of the text content. This is separate and independent of the authoring information relative to the metadata and markup, which are among the features of the of the manifestation. \item Any \textbf{content-specification date} (as opposed to validity dates) \end{enumerate} \subsubsection{Manifestation-level Features} Manifestation level features are used to characterize a specific choice of data format and markup. These include the decisions relevant to generating the document in this exact form. They are irrelevant of the storage choices. \begin{enumerate} \item The electronic data \textbf{format} chosen \item The \textbf{markup authoring information} to determine the authoritativeness of the \emph{markup and metadata} \item Any relevant \textbf{markup-specific date} \item Any additional \textbf{markup-related annotation} (e.g., the existence of multiple versions, of annotations, etc.) \end{enumerate} \subsubsection{Item-level Features} Item level features characterize the location and service-level data necessary to access the physical document. \begin{enumerate} \item The physical \textbf{location} \item The \textbf{owner} of the physical location \item Any additional \textbf{service-level annotations} (e.g., authentication, costs, authoritativeness, speed, etc.) \end{enumerate} \subsection{Signaling Naming Convention Conformance} \begin{textbox} The attribute \texttt{metalex:naming} \emph{must} be used in case a manifestation respects a compliant naming convention. (see \autoref{ssub:naming_convention}) \end{textbox} Similar to \textbf{xml:base} and \textbf{xml:lang}, the value of metalex:naming applies to the content of the element (usually the root) that carries the attribute, except when overridden by other \texttt{metalex:naming} attributes. The value of \texttt{metalex:naming} is an IRI. % The standard defines one naming scheme: \texttt{metalex-owl:MetaLexNamingScheme}, which is the proposed naming scheme of section 6. % \marginpar{this is impossible! section 6 is gone} The XML document metadata must declare the manifestation IRI to identify itself. If the manifestation IRI does not conform to the MetaLex naming convention it must declare the manifestation, expression, and work IRI. In addition it must declare identifying metadata from which a conforming \texttt{metalex-owl:\\MetaLexNamingScheme} name can be constructed, being for instance the country, document type, date of creation of the work, an optional sequence number on that date, language of the expression, date of creation of the expression, and date of creation of the manifestation. Identifying metadata for conforming documents can be automatically transformed into naming scheme conformant identifiers and vice versa. % \subsection{Resolving IRIs} % % The process of resolving a IRI is in general the process of determining the access path to the resource specified. It is a requirement for CEN Metalex that there exists a subset of the features listed in section 5.4 that univocally identifies each bibliographic item. Thus each full set of feature values identifies either zero or one bibliographic item. % % The process of determining the physical URL associated to a full set of feature values igenerates therefore either one accessible URL or an error (e.g., no such bibliographic entity exists). % % In general though only a subset of the full set of feature values will be available when resolving IRIs. This may generate the situation in which zero, one or many different physical URLs correspond to the given feature values. % % For the purpose of this section, it is irrelevant whether these feature values are given through the analysis of the request IRI or the identifying metadata as specified in section \ref{idmetadata}. % % Given an incomplete set of feature values a resolution mechanism may apply one of two policies: % \begin{enumerate} % \item It may choose through either user interaction, access to default values, local policies or random choice just one suitable physical URL compatible with the given values; % \item It may return the list of all physical URLs that are compatible with the given values, and enough differentiating information to allow the user to access the most appropriate one. % \end{enumerate} % % It is implementation dependent, therefore, to determine which policy to adopt. It may also very well be the case that the adopted policy may vary depending on the kind and quantity of available values. % % For instance, if we have a set of values that uniquely identifies a manifestation for which many equivalent items exist, then the choice may be random, or it could be an automatic decision determined by authoritativeness and cost of each of the items. On the contrary, given a work-level set of values, then the system may either provide a list of available expressions, or request additional expression-level feature values to disambiguate the request (e.g., the validity date of the document, or the human language used). % % We can therefore separate the process of resolution of a feature set in a number of different operations, as follows: % % \begin{enumerate} % \item \texttt{convert(IRI) $\rightarrow$ feature set} : this operation evaluates the information available in the naming convention and generates a set of feature values that corresponds to those effectively expressed in the IRI. % \item \texttt{expand() $\rightarrow$ feature set} : this operation adds new feature values to the existing set by either access to default values or to computational processes (e.g., specifying the country, the language, the data format, etc. depending on local defaults or application capabilities) % \item \texttt{request() $\rightarrow$ feature set} : this operation requests the user to fill in a number of values required by the application. % \item \texttt{list(feature sets) $\rightarrow$ feature set} : this operation shows the user a list of feature sets of available candidates, and allows the user to choose one (by specifying the preferred feature set). % \item \texttt{random(feature set) $\rightarrow$ feature set} : this operation chooses randomly among a list of feature sets and returns just one of the available ones. \marginpar{Why not just select? There is no presumption that it a an actual random selection, just a for the requester arbitrary one. Also: shouldn;t input be plural?} % \item \texttt{resolve(feature set) $\rightarrow$ URL} : this operation generates the physical URL uniquely corresponding to the feature set. This operation is only allowed if it returns either zero or one URL. % \end{enumerate} % % As mentioned, resolving IRIs may require to adopt different operations at different levels, as determine by local policies. % % For instance, suppose we have a complete work-level IRI. The resolution process would \emph{convert} it to the feature set $FS_{W0} = \{protocol, syntax, country, doctype, date, number\}$. % % Application policies may first determine the remaining work-level features, e.g., the actors, which are deducible by the doctype, thereby \emph{expanding} $FS_{W0}$ to $FS_{W1} = $ $\{protocol,$ $ syntax,$ $ country,$ $ doctype,$ $ date,$ $ number,$ $ actors\}$. It would then use the language defaults of the user’s browser to \emph{expand} $FS_{W2}$ with a first set of expression-level features giving $FS_{E0} = \{protocol,$ $ syntax,$ $ country,$ $ doctype,$ $ date,$ $ number,$ $ actors,$ $ language\}$. % % Then it would determine the list of compatible expressions (e.g., $FS_{E1} = $ $\{protocol, $ $syntax, $ $country, $ $doctype, $ $date, $ $number, $ $actors, $ $language, $ $date_1\}, $ $FS_{E2} = $ $\{protocol, $ $syntax,$ $ country, $ $doctype, $ $date, $ $number, actors, $ $language, $ $date_2\},\ldots , $ $FS_{En} = $ $\{protocol,$ $ syntax,$ $ country,$ $ doctype,$ $ date,$ $ number,$ $ actors,$ $ language,$ $ date_n\}$) and would \emph{list} $FS_{E1},$ $ FS_{E2},\ldots , $ $FS_{En}$, with the user choosing $FS_{Ek} = $ $\{protocol, $ $syntax,$ $ country,$ $ doctype,$ $ date,$ $ number,$ $ actors,$ $ language,$ $ date_k\}$. It would then \emph{expand} $FS_{Ek}$ by choosing by default a specific set of manifestation features (e.g., to use XML, the internal markup team, and the most recent markup date), generating $FS_{M0} = \{protocol,$ $ syntax,$ $ country,$ $ doctype, $ $date,$ $ number,$ $ actors,$ $ language,$ $ date_k,$ $ format,$ $ markup authoring,$ $ markup date\}$ and \emph{requesting} the user whether to show annotations or not, generating $FS_{M1} = \{protocol,$ $ syntax,$ $ country,$ $ doctype,$ $ date,$ $ number,$ $ actors,$ $ language,$ $ date_k,$ $ format,$ $ markup$ $authoring,$ $ markup$ $date,$ $ annotations\}$. % % Then it would \emph{expand} $FS_{M1}$ with default item-level values determined by the user’s profile (e.g., no costs, maximum speed), generating $FS_I0 = \{protocol,$ $ syntax, $ $country, $ $doctype,$ $ date,$ $ number,$ $ actors,$ $ language,$ $ date_k,$ $ format,$ $ markup authoring,$ $ markup date,$ $ annotations,$ $ service type\}$. It would then discover all the available items compatible with $FS_{I0}$ and \emph{randomly} select among them one specified by ownership and location, $FS_{Ij} = \{protocol,$ $ syntax,$ $ country, $ $doctype,$ $ date,$ $ number,$ $ actors,$ $ language,$ $ date_k,$ $ format,$ $ markup$ $authoring,$ $ markup$ $date,$ $ annotations,$ $ service type,$ $ owner,$ $ location\}$. \marginpar{Btw a long example: perhaps we need a style differentiating examples from main text.} % % Finally it would \emph{resolve} $FS_{Ij}$ generating the URL corresponding to it, and allowing the application access the physical bibliographic item. % % Of course this is just one possible resolution policy, and many other could be imagined (e.g., a judge evaluating the legislation relevant to an event could set the expression-level validity date to the date of the event, always being served with the specific expression that was valid at that moment -- the existing versions would never be listed). % section naming_convention (end) \section{Global Conformance Requirements} % (fold) \label{sec:global_conformance_requirements} \label{sec:conformance} Conformance of XML documents to the MetaLex standard means: \begin{enumerate} \item validation of XML documents against a schema that restricts the MetaLex XML schema, \item the theoretical possibility of obtaining an XML document that uses solely MetaLex generic elements and validates against the MetaLex XML schema by way of simple substitution, \item the availability of sufficient metadata as set out in this document, and \item conformance to the CEN MetaLex Workshop Agreement written guidelines. \end{enumerate} \begin{textbox} Any XML encoding is \textbf{transformation conformant} if instances can be transformed automatically into conformant MetaLex XML documents. \end{textbox} The workshop recommends implementing the transformation in the form of an XSL transformation (XSLT\footnote{See \texttt{http://www.w3.org/TR/xslt}}). The workshop also recommends copying any attributes from the original XML encoding into the MetaLex XML document. No additional conformance criteria exist for the OWL Schema. The workshop strongly recommends using or subclassing MetaLex OWL Schema classes and properties wherever reasonable. \subsection{Conformance of Types and Elements} \begin{textbox} A schema that restricts the MetaLex XML Schema \emph{must} declare only MetaLex conformant elements. \end{textbox} The process of declaring an element is subject to the following guidelines: \begin{enumerate} \item You \emph{must} use one of the abstract content models for the element; \item You \emph{may} define a restriction of the corresponding concrete type; \item You \emph{may not} define an extension to the content model of a concrete type; \item You \emph{may} define an extension of a concrete type for the purpose of adding attributes; \item You \emph{must} define the elements as a substitution group of one of the abstract elements and you must identify a type which is either one of the provided concrete types, or the restriction of the content model or extension of attributes of a concrete type that you have defined. \end{enumerate} To define an element conforming to the standard that can be used in XML manifestations of sources of law, define a non-abstract complex type, and create an element belonging to the substitution group of one of the abstract elements according to the subtype specified. \begin{example} \footnotesize \begin{verbatim} \end{verbatim} \normalsize \end{example} \subsection{About Schema Validation}\label{ssec:about-schema-validation} \begin{textbox} The schema to be validated against is made available as a \dtd schema and an \xmlschema . \end{textbox} A few technical issues differentiate the intended meaning from the actual semantics captured by the \xmlschema and \dtd files, mostly because of differences between the intended meaning and the actual expressive power of the languages. \begin{itemize} \item Patterns of the form \texttt{a*} (zero or more \texttt{a}'s) and \texttt{a+} (one or more \texttt{a}'s) in the XML schema are replaced by the pattern \texttt{a*|a,a*|a,a,a*|a,a,a,a*|a,a,a,a,a*}, respectively \texttt{a+|a,a+|a,a,a+|a,a,a,a+|a,a,a,a,a+} to align the semantics of the DTD++ schema and the XML Schema, at least for up to 5, or any other appropriately large number, to the discretion of the technical committee, to allow different restrictions of \texttt{a} in derived conformant content models. In XML schema, if \texttt{b} and \texttt{c} are restrictions of \texttt{a}, the pattern \texttt{b,c} is not accepted as a restriction of \texttt{a+} or \texttt{a*}: the intended reading of the technical committee is that it should be. \item As specified in section 3, attributes from any other (\textbf{\#other}) namespace \textbf{may} be used on any MetaLex element. Unfortunately this requirement as such is not expressible in the current version of XML Schema. The current version of the XML schema and DTD++ therefore allows \emph{any} attribute (even those in the metalex namespace) to be added to elements, and furthermore this property is \emph{not} inherited by derived complex types, and must be added manually \emph{every time} a derived type is created. \end{itemize} % section global_conformance_requirements (end) \bibliographystyle{apalike} \bibliography{cwa,biblioD3.2} % \newpage % \section*{Appendix: OWL Schema Axioms} % \label{owl} % \include{2008-05-02} \end{document}