Metadata for Dokeos 1.6

document version:2005/09/20.

This is a short technical documentation about the metadata (MD) implementation in Dokeos 1.6. The 1.6 implementation (DMD1.6) mainly consists of:
Background information can be found on Zephyr: VeloMetadataClaroline.doc (via Documenten, Metadata). (That document is, however, outdated where it describes the implementation.)

Metadata, XML, MD table

MD is XML-formatted information about a Dokeos object. It is stored in a course database table (not globally), and Dokeos objects are identified in that table by their type + '.' + id. For example, 'Document.12' refers to an object of type 'Document' (a file or a folder in the Dokeos Documents tool).

The design of DMD1.6 allows to define, per type of object, which info is to be stored as MD, and how the MD is represented in XML. Both can be adapted relatively easily, in a PHP-script that defines the object class 'mdobject' for the object type at hand.

DMD1.6 fully implements MD definition, storage and editing for 'Document'-type objects. The class 'mdobject' for these type of objects is defined in the script 'md_document.php'. The class definition includes a method to generate the default MD for new entries. (The scripts 'md_link.php' and 'md_scorm.php' define the class 'mdobject' for 'Link'- and 'Scorm'-type objects. Script 'md_mix.php' defines a subset of the 'mdobject' class functionality for the experimental Search script.)

DMD1.6 works with standard IEEE LOM (Learning Objects MD). The XML-representation conforms to SCORM 1.3 (also known as SCORM 2004). The IEEE LOM elements General.Identifier.Catalog and .Entry are made to contain a globally unique object identifier of the form urn:institution:platform.coursecode.type.id and for element Technical.Location an URL is generated that points to script 'openobject.php', also part of DMD1.6.

To make changes to type and representation of MD more easy, the 'mdobject' class also defines a map for the (generally accepted) Dublin Core elements. A specific Dokeos installation can thereby adapt DMD1.6 relatively easily e.g. to use SCORM 1.2 and/or IMS-XML instead of SCORM 2004.

MD is stored in the (new) course database table 'metadata'. Count on 2-4 KB per metadata record.

The MD records currently have 5 fields: eid (entry-id or object identifier), mdxmltext (metadata text, XML-formatted), md5, htmlcache1, htmlcache2, indexabletext. The latter three fields are used for cached HTML and for storing text to be indexed for search; the hash-value md5 is used to validate the cache.

The script 'md_funcs', part of the MD toolkit, and used a.o. in the index and search scripts, contains a class definition 'mdstore', which handles all database operations. Code that is shared by several other scripts is also to be found in 'md_funcs.php': common functions, code related to IEEE, and code related to the keyword tree (see below).

XMD and XHT

Two new libraries in inc/lib are essential for DMD1.6: the XML Mini-DOM 'xmd' and XML HTML Templates 'xht'. The corresponding scripts contain some comments describing their functionality. Test scripts are included in DMD1.6 to demonstrate the use of these libraries.

DOM XML functions are also available in PHP 4 itself, but they are experimental. They require an extra nonstandard XML library and, on Windows, fiddling with DLLs. To avoid these problems, DMD1.6 comes with its own XML Mini-DOM library.

Several open source template libraries exist for PHP, and yet DMD1.6 again comes with its own one. The main design goal for the XML HTML Templates library is to combine HTML separation and a tight connection with an XML (mini-dom-)document. These are essential, given the goal of flexibility concerning kind and representation of MD and presentation to the user. The 'xht' library is mainly used to generate HTML, but DMD1.6 also uses it to generate XML (e.g. the default XML for new MD records in 'md_document.php') and JavaScript (in 'md_funcs.php').

If it is decided for a future version of Dokeos to use a more 'standard' approach for XML and/or for templates (e.g. Smarty), then DMD will most probably be adapted.

The use of 'xht' in DMD1.6 allows to define, per type of object, what part of the MD is to be shown to a Dokeos user or presented for editing, and how that info is rendered as HTML (between the page header and footer). For 'Document'-type objects, the HTML templates for MD viewing and editing are to be found in 'md_document.htt'. (Compare them with the templates in 'md_link.htt', 'md_scorm.htt' (both not fully supported), and 'mds_mix.htt'., the templates used when rendering the (experimental) search screen.)

Some little notes here will come in handy for easier understanding of the templates. For more info, look into the source code of the libraries.

Mime types and Technical.Format

In the IEEE LOM standard, the metadata element Technical.Format must contain the learning object mime type. DMD1.6 uses DocumentManager::file_get_mime_type as authorative source for mime types and for determining the default mime type based on file extension.

There is a provision for adding mime types that are not listed in DocumentManager::file_get_mime_type, for example alternative mime types for a specific file extension. This is done via the language variable $langFormats (see DLTT and Dokeos lang-file md_document). This language variable must contain an associative list such as e.g. ":text/plain;iso-8859-1:Text;Latin-1,, application/xml;iso-8859-1:Xml;Latin-1". (The second part of a list item, e.g. "Text;Latin-1", appears in the selection box in the metadata screen and can be made language-specific.) (In associative lists, elements are separated by double comma; value and language text are separated by the first character in the language string, here a colon.)

One specific mime type can be designated as the mime type for course keywords documents (see next section). This is done by defining parameter XML in the template file metadata/md_document.htt. In DMD1.6 it contains:
{-D XML application/xml;iso-8859-1-}

Keywords in a tree, JavaScript

MD usually includes keywords, and there is a special provision in DMD1.6 allowing to (optionally) define a structured set of keywords for each course. The course manager defines the keywords in an xml file (an example is provided) and uploads it to the course documents area. When browsing to that document's metadata, there will be a button 'This document contains the course keywords'. The XML-structured keywordtree is then converted to the cache file 'CourseKwds.js' in the course's top-level directory. The button must be used after each change to the xml file. To remove all course keywords (and the cache file), use the button on an xml file containing only spaces or only a top element with no content.

The cache file constructs a clickable tree in HTML (restricted to W3C browsers). The toolkit script 'md_funcs' contains the server-side functions related to the keyword tree, the file 'md_script.js' contains the client-side script.
 
Whether the keyword tree is presented in a screen (index, search, ...), and if so, where and how, can again be defined relatively easily via the templates. The MD view-and-edit screen also converts comma-separated keywords (whether selected with the clickable tree or typed in) to separate XML elements (as required by SCORM 1.3).

The file 'md_script.js' also contains the client-side script used by the HTML templates in 'md_document.htt' for input validation and MD update preparation in screens for 'Document'-type object MD. Whereas keyword-tree clicking requires a W3C browser, input validation and MD update should also work with IE4 and NS4 browsers (not tested).

DMD1.6 contains input validation of two kinds (put the following on the HTML INPUT element):
To provide a minimum level of MD editing support when there is no scripting in the browser, the templates in 'md_editxml' allow direct editing of the XML formatted data. (This same template is used should an XML syntax error be detected, thereby allowing to repair XML metadata.)

To view the XML formatted data, click the 'Store' button while holding CTRL- and ALT-keys down.

The server-side functions for the construction of the keyword tree cache file (in 'md_funcs') mimic an XSLT process which is documented in 'SelKwds.xsl'. (This file, and XSLT in general, is not used in DMD1.6.)

The experimental script 'statistics.php' gives statistics about the usage of course keywords. It is not linked to any Dokeos 1.6 screen, therefore not reachable in a standard installation.

MD toolkit and API

The script 'md_funcs' contains the main part of the toolkit and API. They allow other Dokeos scripts to define, modify and delete MD for specific objects (see class 'mdstore'). The script 'md_funcs' must be combined with a script that defines the object class 'mdobject' for the specific type of object (such as  'md_document.php' for 'Document'-type objects). The test scripts 'dcex' and 'mdApiTest' demonstrate the toolkit and the API functions.

The simplest way of working with the API is by using the functions 'mds_get_dc_elements' and 'mds_put_dc_elements'. They allow to fetch and store the MD elements that are part of the so called Dublin Core. The DC elements form a generally accepted core set of metadata.

The function 'mds_update_xml_and_mdt' is particularly useful for translating user interactions with a MD edit screen to MD-store operations. When using the API, it might be more handy to work with xmd and mdstore operations directly.

A word of warning: MD scanning is a relatively compute-intensive task. If used in a loop, e.g. to display some specific info about several hundreds of documents, server response might slow down.

Other files in DMD1.6

Language files 'md_document.inc.php' are available for English, French and Dutch. Language files 'md_link.inc.php' and 'md_scorm.inc.php' only exist in English.

Files 'md_link.php' and 'md_link.htt', also 'md_scorm.php' and 'md_scorm.htt', all already mentioned, are used in conjunction with the not fully supported functionality related to Link metadata and SCORM package metadata import.

File 'md_link.php', in conjunction with 'index.php', demonstrates the use of the mdo_override and mdo_storeback methods allowing to implement a more tight synchronization between MD and standard Dokeos object properties than is actually implemented for document MD (see also below: Link metadata editing).

Caching

The 'xht' library provides caching functions, which allow to speed up screen building. DMD1.6 caches information to database fields 'htmlcache1' and 'indexabletext' ('htmlcache2' is not used in DMD1.6).

In 'md_document.htt' it can be seen that the MD view-and-edit screen (produced by index.php) is divided in four main parts: part 1, the keywords tree, part 2 and the POST form.

Instead of a normal "call" from a template to a subtemplate, which would be "{-C METADATA_PART1-}", the main template does an "escape-call" "{-E md_part1 C METADATA_PART1-}". The escape construct works as follows: the 'xht' library does a callback to the user code, in this example to the PHP function 'md_part1'. The code for that function can be found in 'index.php'. That function checks whether it has a valid cached HTML and if so, returns it, thereby avoiding the template expansion of the subtemplate METADATA_PART1. If not, 'xht' effectively does the (supposedly slow) expansion and allows the callback function 'md_part1' to store it for re-use.

In DMD1.6, "part 1" of the screen contains most template expansion work, hence the database field 'htmlcache1' is a real HTML cache. Another part of the screen is made to contain the "words" from the metadata that must be indexable and searchable. It corresponds with the database field 'indexabletext'.

Under certain circumstances, caching may cause a delay after a change. For example, when making languages visible or unvisible, they may not immediately appear in or disappear from the SELECT inputfields in existing metadata. To make the change visible, edit that metadata.

Toolkit/API functions such as 'mds_append', useful e.g. for adding searchable words to 'indexabletext', must be used with care, because of possible interactions with the index script, when it allows users to modify metadata (and therefore also indexable words) interactively.

Index and Search scripts

Both scripts lean heavily on the libraries and on the API; they are therefore relatively short.

Note that all output is produced in a section at the end of the scripts.

DMD1.6 has an experimental screen for searching documents based on their MD. It is not linked to any Dokeos 1.6 screen, therefore not reachable in a standard installation.

This MD search screen described in this section does not require the installation of PhpDig 1.8.6. as opposed to the (not fully supported) PhpDig indexing/searching scripts described further down.

A general search in all metadata is not so easy, because the metadata can in theory be quite different for different types of Dokeos objects. In practice, Dokeos platforms will probably stick to identical or rather similar metadata for all objects and might therefore find the search script useful.

The DMD1.6 MD search script does an unsophisticated database query in field 'indexabletext', supposedly containing all searchable words.

DMD1.6 puts these searchable words in the field: Note that keywords are transformed, e.g. MD keyword 'fish' will become searchable word 'fish-kw'. This allows search to focus on the keyword, without finding references where the word 'fish' is part of some description. This can of course (because of the templates) be changed relatively easily, but it should be noted that the current search screen & script, and also the PhpDig connection, assume this transformation.

The script 'update_indexabletext.php' can be used to update MD records when the definition of the searchable words is changed. It is not linked to any Dokeos 1.6 screen, therefore not reachable in a standard installation. It uses function mdo_define_htt already mentioned above. For documents, md_document.php should then contain the same definition as the one in md_document.htt. Use the script with e.g. '?eid_type=Document'.
The SCORM package metadata import script importmanifest.php (see below), if used with SCORM 2004 packages, generates metadata records (type 'Scorm') that are very similar to the 'Document' type metadata records.

Before generating output, search combines (in memory) the XML metadata of all Dokeos objects that it has found for a particular query into a big, imsmanifest-like XML document. It is expected that this will cause problems if many hundreds or thousands of objects have metadata and can therefore be "found" in one query.

All of this shows that the search script will need to evolve in future Dokeos versions.

To make metadata search available on your Dokeos server, include a link to
.../metadata/search.php?type=Mix

DMD1.6 files with comments

Updates for standard Dokeos scripts

document/edit_document.php

The (one and only) link between Dokeos and metadata (via Documents).

lang/*/document.inc.php

Two additional language-dependent words for edit_document.

inc/lib/fileManage.lib.php

Updated to delete the metadata entry when deleting a document or a SCORM folder. (Link-MD is not automatically deleted.)

Functionality not fully supported in DMD1.6

Link metadata editing

To allow course managers to interactively store and edit metadata about a Link, provide an URL such as:
.../metadata/?eid=Link.nnn

This metadata may e.g. add keywords.

Unlike with Document-type objects, Link-type metadata object editing has an override- and storeback-functionality. When metadata is displayed for editing, DB data is overridden by new data from the Links table (but not automatically stored): category, url, title, description, keywords. When metadata is changed in the MD edit screen and stored, then new data is stored back into the Links table: url, title, description, keywords (but not category).

In the Links table, MD description and keywords are combined in the description field, as follows:
<i kw="kw1, kw2, ...">Description</i>
Thereby keywords are not visible to the user, yet editable by the course admin.

importlinks.php

This script, not reachable until you e.g. link it to a course homepage, performs the following operations related to Links:
As importlinks is meant to be used only by course admins, hide it after you have linked it to the course homepage.

SCORM metadata import and custom browsing

importmanifest.php

This script, not reachable until you e.g. link it to a course homepage, performs the following operations related to Metadata Table Entries (MTEs) and SCORM package directories (SPDs) in Learning Path (which have a SCORM Directory Id SDI): Note that the above mentioned 'index.php' in the SPD is created by import.

As importmanifest is meant to be used only by course admins, hide it after you have linked it to the course homepage.

playscormmdset.inc.php

This include-script contains the main functionality of the custom browser.

Import creates an 'index.php' in the corresponding scorm folder of a course. It includes 'playscormmdset'.

(Thereby to a search engine, the custom browser will appear as if it is located in that scorm folder. This is important for search engines that allow to index/re-index by virtual directory.)

The custom browser uses a templates file to generate HTML, but unlike the standard MD screens, it looks for that templates file in the scorm folder or in its parent folders. Thereby the generated HTML can be different for different scorm folders.
An example templates file can be found in metadata/doc/mdp_scorm.htt.


PhpDig connection

DMD1.6 includes functionality allowing a specific course to work with a customized version of PhpDig 1.8.6 that has been built into the course. This provides quicker and more sophisticated search functionality.

The connection consists of the script 'md_phpdig.php', this document section, and the customized files in ...main/metadata/phpdig.

It is assumed that a system admin installs a copy of PhpDig in a subfolder 'phpdig-1.8.6' of the course webfolder, customizes it as described below and by the sample files, and initializes it by running PhpDig's install script.

The admin screen of PhpDig can best be defined as a hidden link (because course-admin only) in the course homepage. A link in a separate window is best, as the admin screen has no Dokeos header.

Script 'md_phpdig.php' contains a few lines copied from the PhpDig config script and a set of functions that can be used as API functions providing a PhpDig DB-feeder mechanism. They allow combinations of URLs and searchable words to be fed into the DB directly, bypassing the PhpDig spider script. The API code is PhpDig spider code, covered by the GNU GPL just like PhpDig is.

Scripts 'importdocs.php', 'importlinks.php' and 'importmanifest.php' make use of that API to index MD for PhpDig. None of them are reachable from standard Dokeos 1.6 screens.

The PhpDig Search screen, which can be used instead of the experimental MD search screen, is the custom 'search.php' available in the metadata/phpdig folder. It must be copied to the 'phpdig-1.8.6' subfolder of the course webfolder and then made reachable from the course homepage.

PhpDig by default combines search terms with AND and searches for words starting with the search term strings. Negation is done by putting a hyphen before the search term (implemented as ALT-click in the search screen keyword tree).

Some background information can be found on Zephyr: VeloMetadataClaroline.doc (via Documenten, Leerobjectbouwstenen, Exploreerbare leerstof: document SearchableImageWebsite).

PhpDig 1.8.6 customizations overview

includes/config
libs/phpdig_functions
' \'._~@#$:&%/;,=-]+' replaced (twice) by
' \'._~@#$&%/=-]+' no :;, in words
search.php
This is the script that must be made accessible in the course, to provide PhpDig search. It is a newly developed script replacing PhpDig's standard one.

Course managers can adapt the search form and provide extra search criteria as explained in the SearchableImageWebsite document mentioned above.
libs/search_function
" \'.\_~@#$:&\%/;,=-]+" replaced by
" \'._~@#$&%/=-]+" no \:;, in words

two special "words" are used for controlling the displaying of the search results: "txt-sep" (newline) and "txt-end" (end of display)

the "-kw" tail of keywords is stripped off in the search results

thumbnail support
This is quite well explained in the above mentioned background material.

This works only with special-design SCORM packages: item resource file[1]/@href is assumed to point to the thumbnail image, which must have a filename 'pptsl' + nnn + '_t.jpg' (see a.o. 'importmanifest.php').

In md_phpdig.php, the '&thumb=...' part of URLs is cut off for display.

Metadata search also displays the thumbs (see '.../main/metadata/search.php' and 'mds_mix.htt').