paragraph tag needs formatting structure
Bergsma suggested that the paragraph tag needs structure to accomodate real-life
text such as lists and hierarchies. Chapal seconds the need. I tend to agree,
but am not sure currently how to accomplish this. The viable proposals from my
perspective seem to be:
1) allow XHTML inside paragraph tags
2) allow Docbook or simple Docbook in paragraph tags
Other proposals that seem less viable include:
3) Decompose structured text into a series of <paragraph>.
4) Inject structured text, with its native markup, as a CDATA block in <paragraph>.
5) Make <paragraph> nestable
Three (3) and five (5) don't seem to solve the problem completely. Four (4)
solves the problem but isn't at all standardized, and so interpreting paragraphs
would be essentially impossible depending on what people used for their markup.
Every author could use their own markup if we sanctioned (4).
Here's a copy of the email thread that lead to this bug:
---- Start message from scott.chapal at jonesctr org -------
Has there been any discussion regarding Tim's questions? I thought he
pointed out an area that deserves clarification before EML 2 is
I too, am thinking there ought to be additional structure to represent
Tim Bergsma <firstname.lastname@example.org> writes:
The problem, as I have mentioned previously, is that prose metadata
(text) is often highly structured. <paragraph> gives us no way of
representing the structure of text, which is itself information. In
many instances, of course, <paragraph> is repeatable, which allows us
some leeway to represent sequential structure. But there is still no
way to represent hierarchical structure. This has significant
consequences. For example, a project-level abstract may include a short
outline of purposes or hypotheses. A research protocol may include
finely-grained outlines of contingencies and responses.
Three alternative solutions have emerged from previous discussion.
1. Decompose structured text into a series of <paragraph>.
2. Inject structured text, with its native markup, as a CDATA block in
3. Make <paragraph> nestable.
I hope that the leadership of the eml development community will offer
me some guidance on this issue. I really don't think number 1 is a
viable option, but could make peace with either 2 or 3.
I don't particularly like any of those options.
How about deferring to some [SGML/XML] standard to represent prose?
Possibly DocBook; use the (W3C)Schema when it is complete?
Or even the simplified DocBook?
Necessarily, existant documentation would have to be deconstructed or
converted to a markup language format. Or if visual formatting is
paramount then it could point to a (quasi) neutral file format like
.pdf, but that wouldn't accomplish textual indexing, querying and
structure that EML aims to support.
This comment of Tim's really struck me:
However, converting hierarchically-structured text to
serially-structured text will require innovations by the data
manager that raise him/her to the status of author, a status not
necessarily sanctioned by those who contributed the original
I think this concern is largely unwarranted. Were it viable,
Information Scientists, 'New' librarians and editors wouldn't be able
perform their functions either. Fidelity to an original work should
be mandatory, but the translation process should be made transparent
and trivial. If an editing review is needed, then create one.
Perhaps even a 'stamp of approval' or something. We're talking about
metadata -- documentation, after all. If it's not structured it's
really not very useful.
---- End message from scott.chapal at jonesctr org -------