The Library of Congress >> Especially for Librarians and Archivists >> Standards

MARC Standards

HOME >> MARC Development >> Proposals List


MARC PROPOSAL NO. 2023-03

DATE: December 21, 2022
REVISED:

NAME: Adding Subfields $0, $1, $2, and $5 to Fields 720 and 653 in the MARC 21 Bibliographic Format

SOURCE: PCC Standing Committee on Standards

SUMMARY: This paper proposes adding subfields $0 (Authority record control number or standard number), $1 (Real World Object URI), $2 (Source of standard number or URI), and $5 (Institution to which field applies) to two fields for uncontrolled data in the MARC 21 Format for Bibliographic Data: 720 (Added Entry – Uncontrolled Name) and 653 (Index Term – Uncontrolled).

KEYWORDS: Field 720 (BD); Added Entry-Uncontrolled Name (BD); Field 653 (BD); Index Term-Uncontrolled (BD); Subfield $0, in field 720 (BD); Subfield $0, in field 653 (BD); Authority record control number or standard number (BD); Subfield $1, in field 720 (BD); Subfield $1, in field 653 (BD); Real World Object URI (BD); Subfield $2, in field 720 (BD); Subfield $2, in field 653 (BD); Source of standard number or URI (BD); Subfield $5, in field 720 (BD); Subfield $5, in field 653 (BD); Institution to which field applies (BD)

RELATED: 2022-DP08; 2019-DP03

STATUS/COMMENTS:
12/21/22 – Made available to the MARC community for discussion.

01/31/23 – Results of MARC Advisory Committee discussion: Approved, with the following amendments: 1) remove subfield $2 from fields 720 and 653; 2) add subfield $7 (Data provenance) to field 720.

05/17/23 – Results of MARC Steering Group review - Agreed with the MAC decision.


Proposal No. 2023-03: Adding Subfields $0, $1, $2, and $5 to Fields 720 and 653

1. BACKGROUND

The ability to associate an authorized name or subject heading in a MARC record with an identifier or dereferenceable URI was established by the introduction of subfields $0 and $1. In a linked data environment, however, many sources do not record a preferred heading or label. It is the identifier or URI that fixes the identity of the entity being referenced, with the label supplied mainly to aid human readability. Unlike authorized forms in library authority systems, such labels are not necessarily either stable or unique. In traditional library terms these labels are not controlled. In the course of the recent PCC URIs in MARC pilot, participants encountered a number of use cases where it was desired to reference entities from such sources. Wikidata and ISNI are two examples of sources that pilot participants wished to use.

The MARC Bibliographic format makes provision for uncontrolled names and subjects in fields 720 and 653 respectively, but $0 and $1 are not currently defined for these fields. However, terms from nontraditional sources arguably fit well into these fields. Adding an identifier or URI subfield to these fields would enable them to be associated unambiguously with the relevant entities.

As noted above, Wikidata itself does not create authorized access points, just labels that can be changed/updated/swapped around and need not be unique. As we move into real identity management and incorporate more data sources into our descriptions, we will need better ways to use the data present in non-NAF sources. There has long been a desire to reduce the amount of work involved in authority control, especially for entities that exist in sources like Wikidata and ISNI but are not established in the NAF. If $0 and $1 were allowed in the 720, catalogers could use a label from the source vocabulary and the URI as an alternative to creating a NACO record for that entity. This is especially helpful for entities that do not necessarily meet the requirements for an NAR but do warrant an explicit entry in the bibliographic record.

In recent discussions a wide range of use cases that would benefit from this approach have come to light. It can, for example, reduce the workload involved for catalogers working with materials in foreign languages. For electronic serials in particular, there are often Wikidata entries for issuing bodies but not enough information for the cataloger to confidently establish an NAR.

Theses and dissertations present an opportunity to incorporate URIs in fields that are otherwise not controlled. These works often discuss specialized or emerging subjects that are not yet established in LCSH and other "traditional" library vocabularies but do have URIs available. These terms are often added to the 653 field, in addition to other author-supplied keywords. In addition, many institutions have workflows that make it impracticable to create national  authority records for authors of theses and dissertations but can take advantage of existing ORCID and other identifiers. Beyond serving an immediate identity management function, adding the $0 or $1 to the 720 would facilitate future authority work if/when the same author publishes in the future.

High-volume archival and special collections resources would similarly benefit from an identity management approach to access. For instance, pursuing authority work for names of buildings and structures is often resource intensive. Frequently, extensive research is conducted only to conclude that the available information about the entity is too incomplete to pursue authority work. The ability to include uncontrolled names of significant structures and prominent landmarks—and include a URI to alternative sources such as the National Register of Historic Places NPGallery Database or SAH (Society of Architectural Historians) Archipedia—increases access while reducing the procedural burden of controlling forms of names.

Enhancing MARC support for external sources also facilitates collaboration with partners outside traditional libraries. In recent discussions with the Buddhist Digital Resource Center (BDRC), Harvard and Columbia catalogers learned that the BDRC, which has previously mapped its names to 720, is capturing associated VIAF IDs and is able to output them in MARC. However, the absence of $1 in 720 is obviously an impediment to doing this. At time of writing BDRC is investigating the possibility of parsing its names for output to 7XX, but being able to give VIAF URIs in $1 would considerably simplify the transformation process. BDRC also maintains its own subject vocabulary, which it maps to 653. Defining $0 and $1 for 653 would enable identifiers to be given for these subject terms.

2. DISCUSSION

2.1. Subfields $0 and $1: Use Cases and General Discussion

It should be noted that MARC already allows identifiers from nontraditional sources such as ISNI and Wikidata to be given in other 1XX/6XX/7XX access point fields. The PCC Linked Data Best Practices final report envisages the use of $1 in a traditional access point field in two types of cases: (a) where it is desired to associate a Real World Object (RWO) URI from an external source with a name or subject that is already established in the library's authority file, and (b) where a name is not established in the library's authority file but nevertheless conforms to its conventions for heading construction (e.g., in using a last name, first name form for Western names). Those remain valid uses and the present proposal does not suggest making any changes to those fields.

However, many cases are better served by fields outside the traditional 1XX/6XX/7XX access point fields. The point may be illustrated by reference to 6XX subject fields. There are two main reasons why it is problematic to use a standard 610, 611, 630, 647, 648, 650, or 651 field to encode a subject from a nontraditional source such as Wikidata. The first is that it requires the cataloger to know what the correct MARC field is based on the type of entity. For automated processes, it might not be easy to assign the correct tag. In addition, even during manual metadata creation there may be times where the entity being cited does not easily fit into one of the 6XX categories noted above and thus a less rigid 653 would be more appropriate. Since linked data entities are described in their RDF and do not rely on MARC coding to identify their type, the use of the less rigid 653 field fits well with a linked data approach. The second and more fundamental reason why using traditional subject added entry fields can be problematic is that terms drawn from such sources are often neither unique nor stable, which is to say they are not "controlled" as librarians understand the term.

In January 2019, the MARC Advisory Committee (MAC) considered Discussion Paper 2019-DP03 from the German National Library on the handling of subjects of an unspecified entity type, i.e., of a type that could not be identified as falling into one of the entity types (personal, corporate, geographical, topical, etc.) designated by the existing 6XX fields. At its meeting MAC concluded that headings of the kind that motivated the discussion paper, from the Bavarian Library Network's Gnomon thesaurus, were indeed "unspecified" but were nevertheless "controlled". Because these subjects were considered to be controlled, MAC concluded that 653 was not appropriate for headings of this type. A new field, 688 (Subject Added Entry-Type of Entity Unspecified) was defined instead. The MAC discussion also raised, but did not pursue, the possibility of a corresponding 7XX field for names of an unspecified entity type that were nevertheless controlled.

The 720 and 653 fields are explicitly designated as uncontrolled. They share the characteristic of 688 of referring to entities whose type cannot always be determined. This can particularly be an issue with data mapped from an external source rather than being inspected on a case-by-case basis by an individual cataloger. (Both fields make limited, but optional, provision to designate entity type via indicator values.) However, they differ from 688 in that the labels are uncontrolled. The terms controlled and uncontrolled are difficult to define rigorously in a MARC context, in part because there is some looseness in the way 1XX/6XX/7XX fields are defined: they accommodate not only terms that are explicitly established, but also terms that conform to accepted conventions for heading construction, which is a lower bar to clear. However, a term that is uncontrolled will tend to be characterized by the absence of a preferred label that is unique within the specified vocabulary, and the labels that are given are not necessarily stable. The current definitions of 720 and 653 reflect that traditional understanding of "control":

720: Added entry in which the name is not controlled in an authority file or list. It is also used for names that have not been formulated according to cataloging rules.

653: Index term added entry that is not constructed by standard subject heading/thesaurus-building conventions.

These definitions are still applicable to the uses under discussion, since the language pertains to the label and not its identification with an entity.

The possibility of defining identifier subfields for 720 and 653 is consistent with these definitions because, in a linked data context, a label can be uncontrolled in the library authority sense but nevertheless be associated with an entity through the provision of an identifier.

Currently neither $0 or $1 are defined for 720 and 653. The addition of $0 does raise a minor technical issue regarding the definition of $0. Appendix A currently states:

Subfield $0 contains the system control number of the related authority or classification record, or a standard identifier. These identifiers may be in the form of text or a Uniform Resource Identifier (URI).

If added to 720 and 653, $0 will never refer to the system control number of a related authority, since if an authorized name were available it would more properly be recorded in a standard 1XX/6XX/7XX field. An authorized subject string would similarly be recorded in a 6XX access point. But $1 accommodates only Real World Object (RWO) URIs, and some identifier schemes use only alphanumeric strings. There may also be applications that prefer to use the alphanumeric form of an identifier even where a corresponding RWO URI is available. For these reasons it may be advantageous to include $0 in 720 and 653 to carry standard alphanumeric identifiers.

Inevitably there are system implications to consider. Labels given in uncontrolled fields will usually be indexed in current discovery systems, but are unlikely to be included in browse indexes. Given that labels given here will not reliably be formulated according to authority conventions, their omission from browse indexes is arguably more an advantage than a drawback. (Indeed, in one possible implementation scenario, $0 or $1 could be given without an accompanying label in $a; a suitable label could subsequently be pulled in and populated either into the source MARC record or into the discovery layer to serve the needs of a particular audience.) Excluding uncontrolled terms from indexes for 1XX/6XX/7XX access points does have the benefit of removing clutter from headings maintenance routines. Names that can be assumed to follow authority conventions can continue to be coded 1XX/6XX/7XX at the discretion of the cataloging agency. The question remains how to integrate linked data sources such as those envisaged in the use cases discussed here into library discovery. However, that is a much broader issue that will need to be pursued outside the confines of any specific MARC proposal.

Before leaving this discussion two further MARC coding options should be noted, one of them in the traditional 6XX block and the other in the recently introduced 758 field. 6XX second indicator 4 is defined for headings with "source not specified", and it may initially appear that this indicator value could be used for uncontrolled subjects. However, the definition makes it clear that these headings differ from other 6XX access points only in that there is no code defined for the source vocabulary in the MARC Subject Heading and Term Source Codes list or implied by a value in the second indicator. In principle such a code can be requested. The definition describes these as controlled headings in contradistinction to uncontrolled terms in 653.

2.2. Consideration of Field 758

In preliminary comments from MAC on the discussion paper preceding this proposal (2022-DP08), the authors were asked to address the potential use of field 758 (Resource Identifier) to accommodate subject relationships. While it is true that 758 is defined very broadly and is not intrinsically limited to any particular kind of relationship, that is because it is designed to be flexible enough to accommodate the variety of data models that one may expect to see in linked data statements. The use cases that motivated the 758 proposal were concerned with recording work-to-instance, or what are sometimes called primary relationships. The values that are expected to appear in the $4 predicate position are relationships such as bf:instanceOf. These are different in nature from subject relationships, which are already well catered for in the 6XX fields, or 653 in the case of uncontrolled terms.

2.3. Labels and Alternatives

In MAC comments on Discussion Paper 2022-DP08, a potential concern raised was that of label maintenance. A label recorded in 653 $a or 720 $a may match the corresponding label from a source like Wikidata at the time of cataloging, but the two could quickly diverge, both because labels in Wikidata are readily changed, and because these fields are likely exempt from routine maintenance commonly performed by various bibliographic utilities and services for other 6XX and 7XX fields. In response, the authors again considered the potential for 653 or 720 $0 or $1 to be recorded without any label, as discussed above. This could allow systems to instead retrieve and display the current labels from the target URI on the fly. With a multilingual source like Wikidata, different labels could potentially be retrieved based on the language preferences of the user or institution.

Or, $a in these fields could be used to record a locally supplied label that may acceptably or even intentionally differ from a label associated with the identifier or URI for a variety of reasons. The source vocabulary may lack a label in the language of the institution. Or, institutions or communities may wish to use local labels that intentionally depart from those of the referenced URI. This could be done, for example, to supply local labels in lieu of offensive or outdated terminology found in a controlled vocabulary, to use more specialized or user-friendly labels suited for a particular community, to record personal name elements in a different order, or to provide terms in a language, script, or spelling different from the source label, while still referencing a URI for that entity in $0 or $1. Systems may be configured to display and maintain these fields in a variety of manners as desired by the institution. Additional examples illustrating these potential alternative uses are provided below.

2.4. Semantic Shift

A related concern raised in comments on 2022-DP08 is semantic shift. Semantic shift can occur even with labels in more traditionally controlled library vocabularies, but if the labels in a source are neither stable or unique, it may be more difficult to determine whether the entities described by those labels are similarly stable or unique. This may especially be the case with conceptual topics expressed as subjects in 653. Similar issues may arise where sources differ as to whether name changes represent new entities, for example corporate body name changes or serial title changes. In a linked data context, however, the meaning of the term is given not by the label but by the definitions and associated data provided via the linked data URI. If the linked data source has a clear data model and good data management practices, ambiguity and semantic shift can be avoided. The authors suggest that best practice guidance should help establish which sources of URIs are relatively stable and therefore preferable for community usage.

2.5. Subfield $2

At the MAC Annual meeting in 2022, discussion and a straw poll indicated some mixed support for inclusion of the additional subfield $2 in the proposal to record the source of standard numbers.

The authors have reservations about the use of subfield $2 in an uncontrolled field. If defined for 720 and 653, $2 would operate very differently from its use in traditional 1XX/6XX/7XX access points. In those fields, $2 identifies the vocabulary – typically recorded in an authority file – against which the term or name should be matched for purposes of verification and maintenance. Subfield $2 would not serve this purpose for an uncontrolled term or name, since there is by definition no preferred term or name to match against. Instead, providing a value in $2 could potentially help in managing uncontrolled data by identifying the source linked in $0 or $1. However, its use for that purpose would again diverge from existing practice for controlled fields.

Current PCC guidelines permit the use of $1 to associate a name from a controlled source with a linked data entity that identifies the same real world object but does not necessarily share the same label. (See the PCC Task Group on Linked Data Best Practices Final Report, section IIIb.) In other words, there is no intrinsic relationship between the identifier given in $1 and the vocabulary identified in $2. This is consistent with the definition of $2 as indicating the source of heading or term rather than the source of an associated identifier or URI.

The same is true of $0 in this context. The MARC format defines $0 as containing "the system control number of the related authority or classification record, or a standard identifier." But while $0 is customarily associated with a preferred label in 1XX/6XX/7XX access points fields through the provision of an authority record number, $0 has a more limited meaning in the context of an uncontrolled field. In an uncontrolled field $0 is not used for an authority record number, since none exists. Instead, it will contain either a URI for a non-RWO entity, or an alphanumeric identifier provided in lieu of a URI. Neither of these uses implies a relationship to a preferred label.

If subfield $2 is indeed approved for fields 720 and 653, a decision will need to be made about which source code list it should reference in these fields. In other traditionally controlled 6XX and 7XX fields, $2 has contained a code from the Subject Heading and Term Source Codes or Name and Title Authority Source Codes lists, indicating the source of the term or name in subject and name access points, respectively. Given the uncontrolled nature of 720 and 653, however, this usage may not be appropriate. Rather, as discussion from the MAC meeting suggests, $2 in 653 or 720 would instead specify the source of the identifier or URI in $0 or $1, in which case the Standard Identifier Source Codes list is likely more appropriate, closer to $2 of Bibliographic or Authority field 024 than to that of other Bibliographic 6XX/7XX fields.

Examples and proposed revisions in this document reflect the latter option for $2. Because the same code is typically recorded as a parenthetical prefix to the standard number or identifier according to the MARC specifications for $0, when the number or identifier comes from that source, recording the same code in $2 would be redundant in many cases. Whichever list is chosen, the MARC Field Index to Source Code Usage would need to be updated accordingly.

2.6. Subfield $5

While not discussed at the MAC Annual meeting, subsequent comments also suggested support for inclusion of subfield $5. The addition of subfield $5 would allow an institution to designate the field as institution- or copy-specific, such as genre/form terms assigned according to local policy in 653, or names of former owners associated with a copy in 720. The authors considered this to be a comparatively straightforward addition and have included it in the proposal, with examples below.

3. PROPOSED CHANGES

This paper proposes four new subfields in Fields 720 and 653 in the MARC 21 Bibliographic Format. Additions are underlined in bold.

3.1. Field 720

720 – Added entry – Uncontrolled Name
Added entry in which the name is not controlled in an authority file or list. It is also used for names that have not been formulated according to cataloging rules. Names may be of any type (e.g., personal, corporate, meeting).

Indicators

First - Type of name
# - Not specified
1 - Personal
2 - Other

Second - Undefined
# - Undefined

Subfield Codes

$a - Name (NR)

$e - Relator term (R)

$0 - Authority record control number or standard number (R)
A standard number or code associated with the entity named in $a.
See description of this subfield in Appendix A: Control Subfields.

$1 - Real World Object URI (R)
See description of this subfield in Appendix A: Control Subfields.

$2 - Source of standard number or URI (NR)
MARC code that identifies the source from which the standard number or URI was assigned. Code from: Standard Identifier Source Codes.

$4 - Relationship (R)

$5 - Institution to Which Field Applies (NR)
See description of this subfield in Appendix A: Control Subfields.

$6 - Linkage (NR)

$8 - Field link and sequence number (R)

3.2. Field 653

653 – Index Term – Uncontrolled
Index term added entry that is not constructed by standard subject heading/thesaurus-building conventions.

Indicators

First - Level of index term
# - No information provided
0 - No level specified
1 - Primary
2 - Secondary

Second - Type of term or name
# - No information provided
0 - Topical term
1 - Personal name
2 - Corporate name
3 - Meeting name
4 - Chronological term
5 - Geographic name
6 - Genre/form term

Subfield Codes

$a - Uncontrolled term (R)

$0 - Authority record control number or standard number (R)
A standard number or code associated with the entity named in $a.
See description of this subfield in Appendix A: Control Subfields.

$1 - Real World Object URI (R)
See description of this subfield in Appendix A: Control Subfields.

$2 - Source of standard number or URI (NR)
MARC code that identifies the source from which the standard number or URI was assigned. Code from: Standard Identifier Source Codes.

$5 - Institution to Which Field Applies (NR)
See description of this subfield in Appendix A: Control Subfields.

$6 - Linkage (NR)

$7 - Data provenance (R)

$8 - Field link and sequence number (R)

4. EXAMPLES


4.1. Field 720


Example 1

720 ## $a Kevin Gray $0 (discogs)a312098

Example 2

720 ## $a Tshul khrims rin chen $1 http://viaf.org/viaf/22550486
            [Note: VIAF lists multiple preferred forms; the English language form is: Tshul-khrims-rin-chen, Zhu-chen, 1697-1774]

Example 3

720 1# $a Liliana Essi $1 http://www.wikidata.org/entity/Q19760388

Example 4

720 1# $a Penrose, Mary, $e former owner. $1 http://www.wikidata.org/entity/Q76159079 $5 DLC
            [Note: The English-language label in Wikidata is: Mary Penrose]

Example 5

720 1# $a Ann Smith $2 imdb $0 (imdb)nm4888683
            [Note: IMDb has label: Ann Smith (VII)]

Example 6

720 1# $a NicChoinnich, Fionnag $1 https://www.wikidata.org/entity/Q108829849
            [Note: The English-language label in Wikidata is: Fionnag NicChoinnich]

Example 7

720 2# $a The Other Baby $4 prn $0 (imdb)co0776444

Example 8

720 2# $1 https://www.wikidata.org/entity/Q49006867
            [Note: Wikidata RWO URI for the village of Hara-mura, Japan]

4.2. Field 653


Example 9

653 ## $a Russian invasion of Ukraine $1 http://www.wikidata.org/entity/Q110999040

Example 10

653 ## $a Melbourne General Post Office $1 http://www.wikidata.org/entity/Q6811781

Example 11

653 ## $a Bagras Castle $2 pleiades $0 (pleiades)786609869

Example 12

653 ## $a AcousTech Mastering $0 (discogs)l265260

Example 13

653 ## $a Tanakh $0 https://id.loc.gov/authorities/names/n79054379
            [Note: The LC/NACO AF has authorized access point Bible. Old Testament]

Example 14

653 ## $0 (cerl)cnc00032171 $1 http://thesaurus.cerl.org/record/cnc00032171
            [Note: CERL Thesaurus has label “Convento della Santissima Annunziata, Bologna”]

Example 15

653 #0 $a Anishinaabeg $1 http://www.wikidata.org/entity/Q255872
            [Note: The English-language label in Wikidata is “Ojibwe”]

Example 16

653 #0 Undocumented immigration $2 local $0 (NNC)a3529197

Example 17

653 #1 $a Yalemzerf YEHUALAW $0 (iaafa)14893581

Example 18

653 #2 $a The Scottish Salmon Company $1 http://www.bbc.co.uk/things/1fa5ed44-55c0-4d90-8349-b51d28fadd14#id

Example 19

653 #3 $a Latke-Hamantash Debate $2 wikidata $0 (wikidata)Q4992592

Example 20

653 #4 $a Early Jurassic Epoch $1 http://n2t.net/ark:/99152/p09qtgw32q7
            [Note: RWO URI from the PeriodO gazetteer. $0 and $2 cannot be recorded, as there is
            no available code in Standard Identifier Source Codes]

Example 21

653 #5 $a Saco Lake $0 (gnis)872606 $0 (geonames)5092045 $1 https://sws.geonames.org/5092045/
            [Note: lake in New Hampshire that is not established in LCSH; other lakes with same
            name in Brazil]

Example 22

653 #5 $a Ess Mountain $2 wikidata $1 http://www.wikidata.org/entity/Q24031164
            [Note: Wikidata has no English-language label for this mountain in Ethiopia]

Example 23

653 #6 $a Blackface minstrel music $0 http://id.loc.gov/authorities/genreForms/gf2014026939 $1 http://www.wikidata.org/entity/Q1937548
[Note: The label in LCGFT is “Minstrel music” and the English-language label in Wikidata is “minstrel music”]

Example 24

653 #6 Hand colouring $0 http://vocab.getty.edu/page/aat/300133555 $1 http://www.wikidata.org/entity/Q104341641 $5 Uk
            [Note: The English-language labels in AAT and Wikidata both use the American spelling “Hand coloring”]

5. BIBFRAME DISCUSSION

Adding $0 and $1 to 653 and 720 could reduce the data loss in BIBFRAME to MARC conversion. The current mapping from BF contribution agent to MARC 720 does not allow carrying over the URI since there is at present no $1 or $0 in 720.

6. SUMMARY OF PROPOSED CHANGES

In fields 720 (Added Entry–Uncontrolled Name) and 653 (Index term–Uncontrolled) of the MARC 21 Bibliographic Format, define the following subfields:

$0 - Authority record control number or standard number (R)

$1 - Real World Object URI (R)

$2 - Source of standard number or URI (NR)

$5 - Institution to Which Field Applies (NR)


HOME >> MARC Development >> Proposals List

The Library of Congress >> Especially for Librarians and Archivists >> Standards
(05/17/2023)
Legal | External Link Disclaimer Contact Us