Managing Content: Planning for CORDRA™
Document Information
Status: Draft
Version: V1.00.20050105
Revision Date: 2005-01-05
Abstract
CORDRA&trade repositories and registries are designed to deal with managed content. You need to have a process in place to create and manage the life cycle of your content and its associated metadata. A collection of key aspects of content management for effective integration with a CORDRA system are described.
What are the key things that you need to do if you want your content to be accessible from a CORDRA registry and you want your repository to be a part of a CORDRA repository federation? If you don't already have a formal content management process in place, you will need to do several key things, as described below.
While a particular CORDRA federation or content repository may establish formal requirements, policies and procedures for managing content, the following presents an informal outline of seven considerations for establishing a managed content environment.
Before describing the things that you need to do, it is important to recap why you may want your content objects to be part of a CORDRA system:
The considerations below are all aimed at enabling these aspects.
Each of your content objects needs to have an identifier.
The identifier provides a name or label for the content. All discovery, access and management are coordinated through the identifier assigned to a content object. The identifier is the proxy for the content object.
To be effective, the identifier must uniquely identify a single content object. The identifier is useless if it references multiple content objects. Maintaining the one-to-one relationship between a content object and the identifier that you assign to it is your management responsibility.
You will have to decide on the format of the identifier. If you decide to use a particular identifier scheme, you will find that it may impose requirements or constraints on the structure of the identifier (characters that are permitted, etc.). Whenever possible, use identifiers that are opaque, i.e., they do not convey any meaning and that there are no semantics embedded in the identifier. Identifiers should look like a meaningless string of characters. Remember, identifiers are meant to be processed by the repository and registry software, not by humans.
While developing your content, and before making it public and sharable with the rest of world, you can assign identifiers from your own identifier scheme, e.g., serially numbering each object that you create. However, before making the content public, you will need to make sure that someone else does not assign the same identifier to one of their content objects.
Global identifier uniqueness is a harder problem since you do not control who assigns the identifiers to other content, and someone else might assign the same identifier to their content. Identifiers are only unique within some managed collection of identifiers (denoted a namespace or naming authority).
Thus, you will need to select an appropriate identifier scheme or system for your content to make sure that your identifiers are part of a unique collection. Identifier systems, such as the Handle System, provide a means to obtain a global prefix or naming authority which is combined with your locally unique identifier in a way that ensure the uniqueness of all identifiers within the identifier system.
Selecting a particular identifier scheme may require that you perform additional actions, e.g., registering the identifier and its association with a content object.
You will also need to maintain the relationship between the content object and the identifier in a persistent manner. Thus, at any time in the future, it will be possible to obtain the content object (or information about the content object) by starting only with the identifier. Identifier systems often provide the means to ensure persistence of the identifier and its relationship to the content object.
You need metadata for your content and each object needs to be classified.
Metadata is used by search engines to help find your content. Metadata provides a concise summary of the key characteristics of the content object: what it is, how it can be used. Search and discovery are driven by finding the content from its metadata (searching on the metadata will give you the identifier of the content, which in turn can be used to actually find the content).
While some metadata is better than none, more is better, but good metadata is more important than just more metadata. What makes good metadata? It has to be a good characterization of the content object.
Some metadata attributes, like the identifier of the content, the file format, or the title may be straightforward. Other attributes, like keywords, are more difficult to generate.
You also need to properly classify your content. Content objects often align with a particular aspect or facet of a formal classification scheme. While properly classifying a content object within a formal structure may be difficult, if done well, it will increase the precision of search: users will be more likely to find exactly the content they want. Consistently classifying content will increase the likelihood that related content is discovered and that unrelated content is differentiated.
There are many different models of metadata, each incorporating a selection of different attributes used to characterize the content object. Similarly, there are many different classification schemes. You will need to decide which metadata models and classification schemes are most appropriate.
It is important to recognize that not only is the association of metadata with the content object important, but thatproperly managing, maintaining and updating the metadata throughout the life cycle of the content object are also needed. In a sense, all of the management tasks that apply to content may also need to apply to its metadata.
Versioning and Version Management
You need to think about the life cycle of your content, particularly what will happen when there are changes to the content and how you will manage these changes.
It is important to recognize that content has a life cycle and that content will be updated and will change over time. What happens when the content changes: should the new version replace the old, or should both the new and old exist and be accessible? What happens with the identifier: will you create a new one for the new item, or will you change the existing identifier to reference the new object? If you change the reference to the new object, how does someone find the old version?
To meet the objective of making your content widely available, it will be important to maintain access to the old version of the content. Changing a reference to content may break someone else's systems and learning delivery. While how to select the appropriate version of content is beyond the scope of just placing the content in a repository, you need to develop a plan for how you will track the versions, history and variants of each content object and make the multiple versions available.
As a recommended starting point, treat each version of a content object as a unique item, and assign a unique identifier and collection of metadata to each version of the content object. Likewise, even if you only change the metadata, maintain all prior the versions of the metadata.
Given the importance of versioning, the overall CORDRA model will incorporate more formal approaches for how different versions of content relate and how different versions of metadata relate to a specific version of a content object. A particular CORDRA federation may impose requirements on versioning and version control, or may help with the versioning problem (e.g., automatically building a version history whenever a new instance of a content object is deposited in the registry).
You need to commit to your content being around for a long period of time, independent of your own use and needs.
Once you make your content available to others, they will expect it to be there whenever they want to access it. Remember, the content will reside in your own repository; users who discover it will come to your repository to access the content. Making your content available also involves making a commitment to provide the service to maintain and persist your content over the long term.
Use of an identifier system, like the Handle system, will allow you to transfer the ownership or management of content to someone else, and will let the content move around. That is, when properly used, you can avoid the dreaded HTTP 404 Not Found error. The content can move, change owners, etc., but the key is that the identifier never changes. Instead, the information that links the identifier to the content is updated as needed.
But this does not negate the need for someone to make the commitment to manage and hold the content for the long term. Thus, you need to consider how you will establish a sustainable model for maintaining your content and providing it as a service to others who will rely on it.
Some CORDRA federations may incorporate a repository of last resort or some model to archive or escrow copies of the content in case your individual repository "goes away". However, the federation will only do so under a well-defined set of business and operational policy rules.
You need to determine who can use and access your content and under what conditions.
All content is subject to some rules that control how the content can be reproduced, distributed or used. Unless content is put into the public domain, whoever owns the copyright to the content is entitled to control its use. When someone discovers your content, you will still expect them to abide by the necessary rules that govern use and reuse of the content.
Simply putting a copyright statement on the work or in the metadata is not sufficient: that just declares information about the copyright ownership at the time the statement was made. As part of the management of the content, you will need to decide what users have the rights to use the content and under what circumstances, and you will need to communicate this to those who discover and want to use your content.
Thus, you will need to establish a set of policies describing rights to the content. You should be doing this independent of any plans to use CORDRA. You may also need to put proper controls into place to actually manage these rights.
You might anticipate that some CORDRA federations will require that you prepare descriptions of your content in machine processable form, e.g., using some digital rights expression language to describe use and limitations.
Note, rights and rights management are not about describing how you will enforce the security, authentication or authorization of users who may want to access your repository or obtain content from it. While those issues are important, they are repository management issues, not content issues. The rights management issues described above apply to the content itself, independent of how it is managed within a particular repository infrastructure.
You need to know where and how the content, as a collection, not just as individual pieces, will reside and will be managed.
If you have a collection of content objects and metadata, you need to decide where to store it so that you can manage both the pieces and the entire collection as a whole, e.g., transferring ownership or management of the collection as a whole. The assumption is that your content will be stored in a content repository, a formal software system used to manage the storage of and access to the content.
It is unrealistic to expect that you will be able to manage each content object independently or without some formal supporting software. Maintaining all of the information, versions, metadata, etc., in an organized fashion, plus providing control, management and persistence over the entire collection as a whole is complex.
While CORDRA will not specify a particular content repository system for you to use, there are some core features that your repository infrastructure should support. Most importantly, that given a content identifier, you will be able to obtain or access the content from the repository (subject to the policy rules and rights management conditions that you might impose through the repository). Futhermore, the repository should help in managing versions, rights, ownership, persistence, etc.
Since a CORDRA system functions as a federation of repositories, the federation will need to know about certain features of each repository. For example, a technical aspect is the location of the repository on the internet. A policy aspect is a point of contact, an individual who can be contacted when there are operational or technical issues related to the repository.
Thus you can expect that to be a part of a CORDRA federation you will need to provide information about your repository, some of its technical aspects and its operations, and that you might be expected to provide this information from a single consistent source. While the federation will need this information for its operations, certain federations may go further to ensure the overall reliability and level of service provided by the federation. They will validate the information that your provide; only repositories that are known to be reliable, have a commitment for sustainment, have established operational procedures, etc., will be permitted to join the those federations.
You need to know who
As outlined above, you will need to establish formal ownership for both the content and the repositories.
Someone must always be responsible for the repository, both technically and administratively. Whenever the repository's ownership or management changes, you will need to inform the federation. If the federation detects problems with your repository, or if there is a need for a change, there must be someone to contact. Thus, you need to establish a set of formal management controls over the repository, and keep them up to date.
Who owns a content object? Just because it is in your repository, you, the repository owner, may not be the content owner -- you may only have the rights to manage and distribute the content through the repository. The content owner's rights for distribution and use may be managed through the repository.
As noted, the content owner controls the rights over the content. Your repository infrastructure may need to maintain information about content ownership, and wabout what the repository is permittted to do with the content, independent of what the users who access the content through the repository are permitted to do.
It is also reasonable to expect that the ownership of the content might change over time. It may be important to maintain the chain of ownership information (provenance) over the content objects for historic, management or legal reasons. While a complete discussion of provenance is out of scope for this document, the overall CORDRA model will incorporate provisions for handling provenance information.
| Version | ID | Date | Change Summary |
|---|---|---|---|
| 1.00 | H | 20050105 | Initial release |