Here's a proposal for this issue with EML attributeDomain:
add a new element to "attributeDomain" called "textDomain" that is defined as an
element where the contents represent a regular expression pattern against which
the free text values must match. If the "textDomain" element is empty, then an
implied pattern of '.*' will be defaulted, allowing any string (including the
empty string) to be valid. Patterns use regular expression syntax as used in
the W3C XML Schema Datatypes recommendation for the pattern facet (section 4.3.4).
New Content model for attribute domain is:
<!ELEMENT attributeDomain ( (enumeratedDomain | textDomain)+ | rangeDomain+ ) >
When more than instance of these elements is provided (e.g., a textDomain is
repeated, then the domains are OR'ed together to allow any of the values. Note
that the whole choice group has become repeatable, so mixtures of enumerated
domains and textDomains are possible, although they are exclusive with
rangeDomains (as is currently the case).
Here's a couple of examples:
Specifies any alphanumeric value:
<attributeDomain><textDomain/></attributeDomain>
Specify repeating sequence of one or more digits:
<attributeDomain><textDomain>[0-9]+</textDomain></attributeDomain>
Specify alphanumeric 5 digit string with the first two digits being "MP":
<attributeDomain><textDomain>MP[a-zA-Z0-9]{3}</textDomain></attributeDomain>
Many more examples are possible. The most common practice will likely be to
simply provide an emply <textDomain/> element indicating that any text is
permissible.
We might also want to complicate it a bit more by making "textDomain" have the
following content model (rather than PCDATA):
<!ELEMENT textDomain (definition, pattern*, source?)>
This would allow us to define what is intended by the value space (e.g., values
represent a US postal code), have multiple patterns that are OR'ed (simplifying
regexp syntax), and define a source like in enumeratedDomains. I think the
definition is worthwhile.