The Library of Congress >> Especially for Librarians and Archivists >> Standards

HOME >> MARC Development >> Discussion Paper List

MARC DISCUSSION PAPER NO. 2023-DP01

DATE: December 21, 2022
REVISED:

NAME: Defining a New Subfield in Field 264 to Record an Unsubfielded Statement in the MARC 21 Bibliographic Format

SOURCE: Network Development and MARC Standards Office (NDMSO), Library of Congress

SUMMARY: This paper proposes adding a new non-repeatable subfield to field 264 (Production, Publication, Distribution, Manufacture, and Copyright Notice) in the MARC21 Bibliographic Format to record an unsubfielded imprint or provision statement.

KEYWORDS: Field 264 (BD); Production, Publication, Distribution, Manufacture, and Copyright Notice (BD); Unsubfielded statement, in field 264 (BD); General statement (BD)

RELATED: 2022-DP10

STATUS/COMMENTS:
12/21/22 – Made available to the MARC community for discussion.

02/01/23 – Results of MARC Advisory Committee discussion: MAC expressed little support for the paper although most agreed that BIBFRAME needed to solve the issues it addressed; if a solution is found, then it should avoid using the existing 264 or 260 fields. Generally MAC members thought that this change would adversely affect the exchange of records with no redeeming value other than a solution to the round-tripping from BIBFRAME to MARC. There were numerous rebuttals to the assertions made in the discussion paper, both to its underlying premise and to the impact on the usability of the data in question. These included the divergence in guidance between the Original RDA and Official RDA (including the ability to "break" a standard 264 $a$b$c string into a triplet of 264$a/264$b/264$c fields), the harm to faceting and retrieval (including the introduction of excess noise in searches on name or place), the place of the value of transcribed data vs controlled data, and the deployment of MARC without ISBD punctuation. The authors countered with the challenge of reliably mapping complex combinations of MARC data into BIBRAME and then back to the corresponding complex combinations. This is particularly the case under the dynamic of bibliographic file maintenance. There ultimately appear to be two competing visions for the data – its characterization through the persistence of the subfields vs. the preservation of the source strings as coherent statements. The paper may return as a proposal or another discussion paper at an undetermined future date.

Discussion Paper No. 2023-DP01: Defining a New Subfield in Field 264 to Record an Unsubfielded Statement

1. BACKGROUND

Field 264 is currently defined in the MARC Bibliographic format as follows:

264 - Production, Publication, Distribution, Manufacture, and Copyright Notice (R)
- Indicators
  - First - Sequence of statements
    - # - Not applicable/No information provided/Earliest
    - 2 - Intervening
    - 3 - Current/Latest
  - Second - Function of entity
    - 0 - Production
    - 1 - Publication
    - 2 - Distribution
    - 3 - Manufacture
    - 4 - Copyright notice date
- Subfield Codes
  - $a - Place of production, publication, distribution, manufacture (R)
  - $b - Name of producer, publisher, distributor, manufacturer (R)
  - $c - Date of production, publication, distribution, manufacture, or copyright notice (R)
  - $3 - Materials specified (NR)
  - $6 - Linkage (NR)
  - $7 - Data provenance (R)
  - $8 - Field link and sequence number (R)

Field 264 has considerable overlap with Field 260 (Publication, Distribution, etc. (Imprint)), but provides greater ability to distinguish between the functions of the supplied statements.

Current practice has catalogers record statements for publication, production, manufacture, and distribution in three distinct parts or subfields: place of activity ($a), agent of activity ($b), and date of activity ($c).

2. DISCUSSION

A version of this paper, Discussion Paper 2022-DP10, was presented at the MAC Annual meeting in June 2022. MAC participants desired more details and information about the request from a BIBFRAME perspective. Discussion also touched on aspects around alternative ideas, principally around punctuation, that were not explored in the original. A number of questions were raised about how this might impact search and indexing. While the genesis of this discussion paper stemmed from a difficulty with converting BIBFRAME data to MARC, many of the ideas presented here are independent of those concerns.

Floating above some of the more specific items for discussion is the question of how the 264 field is currently defined and implemented. This larger, more overarching, query was also asked during the June 2022 MAC meeting. Is this field meant for uncontrolled data or controlled data? Creating subfields to distinguish a place from an agent and an agent from a date and a date from a place would suggest a desire to assert some control over the data, but that is undermined by the instruction to transcribe the information (with ISBD punctuation) as observed on the resource. Transcription all but ensures an endless variation to how information is captured. While most provision information has been entered correctly in bibliographic records, transcribing has occasionally introduced a myriad of human errors or copied a written or typographical error from the resource into the bibliographic record. The expectation that one should enter the information as found on the piece stands counter to the objective of control. The latter tracks variations and resolves them to the single entity or string. The former makes no such attempt.

A very quick perusal of the data found in the 260 and 264 fields shows that, historically and presently, the cataloger is focused on transcription, not control. A detailed investigation, during which all the agents found in $b of 260 and 264 fields were excised from the Library of Congress’s bibliographic dataset and isolated in hopes of leveraging this potentially rich source for a controlled use of the data, demonstrates the incredible – and very much uncontrolled – variation. The intent is also captured in the word "statement" in the names for the 260 and 264 fields. The word implies the content of the field should be seen as a whole, not its parts.

Added Entry fields (7XX) are the natural home for identifying the entities found in the 260 and 264 fields in a regimented and controlled fashion. That is the purpose, after all, of the Added Entry fields, unlike the 260 and 264 fields, and why relator codes have long existed for precisely these types of relationships. See Example 9 below.

The 260 and 264 fields are trying to perform two functions simultaneously: enter information in a pseudo-controlled manner by using the subfields as an entity typing mechanism while recording information as it is found on the resource. As a result, many practitioners leverage this bifurcation to their advantage. A desire to view the data as semi-controlled becomes an argument against seeing it as purely a transcribed "statement" and also an argument against explicitly adding agents or places in linked entry fields because, after all, those details are already clearly identified. Likewise, even while acknowledging the content is transcribed, and therefore uncontrolled, the potential loss of subdivisions is seen as a loss of data and granularity, and search precision, even though the very uncontrolled nature of how data are entered in this field in no way assures anything close to clear, robust granular data and search precision.

2.1. RDA/Cataloging Rules

Cataloging rules, specifically those from the original RDA Toolkit, often indicate that it is important to record a structured statement about a resource's publication, production, manufacture, and/or distribution. Those rules are: RDA 2.7.1.1 (Publication), 2.8.1.1 (Production), 2.9.1.1 (Distribution), and 2.10.1.1 (Manufacture). The instructions provide flexibility: provision details can be recorded as distinct elements or as a single statement. It has been MARC practice to split these statements into parts, which employ ISBD punctuation to connect them and which are later combined in displays. This practice is an attempt to add semantic subdivisions in the data while simultaneously constructing a human-readable string. Not only does it require additional effort on the part of the cataloger but also it requires downstream systems to perform various machinations on the data, such as combining the subfields for display or ensuring the content of each is added to a separate keyword index.

Adding a subfield to the 264 field for the complete, unsubfielded statement would provide MARC users with a less complex and simpler alternative for recording this information, and would comply with cataloging instructions, which do not stipulate that these transcribed elements need to be individually identified.

The desire to have such a subfield can stand alone from other considerations presented. In the spirit of permitting MARC implementers a variety of ways to record information for their needs, allowing a single subfield for the whole, concatenated statement is a reasonable request. As such, the current proposal would not force any user to choose between using three distinct subfields versus a single subfield, though it would be logical to use one or the other.

2.2. [ISBD] Punctuation

It is worth drawing attention to the fact that the information in the 264 field is structured in two ways, one on top of the other. Data elements are placed into distinct subfields and structured with ISBD punctuation. One could argue that subfielding is unnecessary since a system should be able to effectively parse the ISBD string into its proper elements for indexing, except that this notion breaks down under closer review as it is impossible to consistently tease apart the punctuation into semantically intelligible information. Likewise, if the information is subfielded into specific elements, then it begs the question – wherefore the punctuation?

MAC participants and Program for Cooperative Cataloging working groups have expressed a strong inclination to maintain the subfielding but cease ISBD punctuation. A few samples below are shown with the punctuation removed (See Examples 7 and 8). It is certainly an improvement in the data - as data (!) - but it very much shifts the display problem downstream. At a basic level, the ambiguity around the directive "remove punctuation" becomes clear – would punctuation within the content of a single subfield be allowed and just the terminal punctuation is removed? Is terminal punctuation allowable per subfield? Presently, the content of the 260 and 264 fields is recorded with the assistance of ISBD punctuation to lend clarity (even though also divided by subfields), meaning that display software merely concatenates the fields and outputs the result. Downstream systems would have to determine whether punctuation was included or not. They would have to figure out a way to cleanly present the data, likely by trying to inject punctuation in some programmatic way.

Removing the punctuation is a seductive option – and one that has been contemplated – but it would likely create downstream consequences far greater and impactful than the introduction of a new subfield. ISBD punctuation is used to generate a human comprehensible statement – a recognition that it is not controlled information. It is a tall order to continue the practice of transcription absent punctuation (and a sudden disappearance at that!) and then expect downstream systems to present this transcribed information intelligibly to a human. To answer the question – Itwillallrunnethhorriblytogetherotherwise

Much punctuation in MARC is unnecessary noise – such as periods at the end of subject headings – but when the punctuation is used for a statement (i.e., a lexical, composed string of highly variable data designed for human consumption), it is essential. To cease the inclusion of punctuation, which makes the individual elements intelligible as a whole, is to challenge the very definition of this field.

2.3. Search

It is often argued that using subfields in the 260 and 264 fields is necessary for search purposes. One such use case centers on software that can either index or filter results based on an individual field and subfield combination. For example, a user searches for all the records with "Venice" in the 264$a. This is a perfectly legitimate use case (and reality) but it glosses over the limitations that exist in that same reality while suggesting such a use case is beyond the realm of possibility should the $s be introduced. One such limitation is that "Venice" may, of course, return resources published in Venice, California (LCCN: 91060088) in addition to the more likely Venice, Italy. It does not account for the more common Italian spelling for books "publicato a Venezia." The information in the 264 field may not be in Latin script, so it could be necessary to search a given term in two scripts. Overlooked, too, are the indexing/searching gymnastics that may (or may not) be happening under the hood: Do these applications simultaneously search the parallel 260/264 subfields? What about related 880 fields? Can the software differentiate between the second indicators in the 264 field to retrieve resources that were distributed (but not published) in a specific location? Suffice it to say, searching/filtering data in the existing 264 field, the related 260 field, and possibly the related 880 fields, is fraught, and there are limitations and imperfections.

But it also remains to be demonstrated that indexing and searching would become meaningfully worse with the introduction of the $s. If search/filter applications are capable of accounting for multiple MARC fields (260, 264, 880), and also possibly accounting for specific second indicator values in the 264, it stands to reason that including the $s would be a rather trivial enhancement to those applications. Although there are certainly searches that would produce results with additional noise, it is unlikely that the net results will be any less or any more problematic than those returned with the existing limitations in place. For example, in a world where 264 $s is used universally, a search for "Venice" would return about five additional results from the entire Library of Congress collection because "Venice" appears in the name of the provider, such as the book published by the Venice Poetry Company (LCCN: 77162831).

The data are more granular, and indexable in a more granular way, because of the existing subfields. That does not seem to translate to a better search experience or substantially more precise search results.

2.4. MARC Field 881

The new field 881 (Manifestation Statements) – specifically subfields $e (Manifestation production statement), $f (Manifestation publication statement), $g (Manifestation distribution statement), and $h (Manifestation manufacture statement) – was briefly considered but quickly ruled out. Information entered into field 881 is expected to be "recorded as an unstructured description" (emphasis added), whereas the information recorded in field 264 is structured, despite the individual pieces of information being recorded as found on the item.

Altering the 881 is a possible solution, but a very problematic one. Firstly, it will require a redefinition of the field. Secondly, assuming that the likely solution would be an indicator value to denote whether the content of the field is unstructured or structured, the solution will be somewhat confusing since it will either apply to all subfields – even those for which a "structured" description is ambiguous – or it will apply only to a select few subfields and require careful documentation to explain. In short, the solution would likely involve an indicator that applies selectively, or with exceptions, and a redefinition of the field.

The idea to add $s to the 264 is precisely because the 264 is a core field, and because the field is definitionally appropriate and is already used to record the same information. The goal is to provide a simpler way to record the data in a common MARC field. It is not to move the data from the 264 to another place in the record. If the only way to store provision information as an unsubfielded string, or statement, is to move it to the 881 field, then the inclination would be to dispense with the recording of this information in the 264 field altogether.

2.5. BIBFRAME

The initial BIBFRAME modelling for this information was as a simple literal -- bf:provisionActivityStatement. For example:

<x> bf:provisionActivityStatement “[Los Angeles, Venice Poetry Co., 1972]”

This example, taken from a 260 field, concatenated $a, $b, and $c, complete with punctuation. It was represented not in essence or in spirit, but in reality, as a statement. The design aligned with RDA instructions for this manifestation information. This represents, basically, how we desire to represent (and have catalogers record) this information in BIBFRAME.

But the statement, in BIBFRAME, could not go back to MARC without being broken up into subfields.

From this point forward, any BIBFRAME solution has not only been more complicated but done for the express purpose of returning this information to MARC in all its subfielded glory.

The next iteration generated a complex set of properties and resources:

[]

a rdf:Resource ;

bf:provisionActivity [

    a bf:ProvisionActivity, bf:Publication ;

    bf:place [

      a bf:Place ;

      rdfs:label "[Los Angeles"

    ] ;

    bf:agent [

      a bf:Organization ;

      rdfs:label "Venice Poetry Co.,"

    ] ;

    bf:date "1972]"

] .

This – and again this is a simple example – has too many issues to go into depth about. But briefly: weakly identified anonymous resources (i.e., cannot use URIs) employed for what should be basic string literals; junky labels, ones that include punctuation and only make sense in relation to other property/resource combinations; completely unnecessary wordiness in the data (which complicates readability, manipulation, and querying); ordering is not technically maintained in RDF (without an even more complicated data structure), even if it might look so in this example.

Since this information is not truly presented as a statement in MARC, the output here is a series of resources, the exact opposite of simple statements, which was the original intent in MARC. In short, the BIBFRAME representation above accurately reflects the MARC reality: it is a complex representation for what is complex data in MARC – despite being advertised as a statement.

The current BIBFRAME configuration is a solution that is somewhere between the two examples above. It leans toward the latter out of necessity given the need to return this data to individual subfields in a MARC field. Yet this is no future-looking solution; it exists to accommodate a need to go backward into MARC.

2.6. Summation

Ultimately, this discussion paper is about exploring a solution that allows for the simpler recording of required descriptive elements in a bibliographic record. Its genesis began because of a modelling difficulty in BIBFRAME, which was itself merely a design to apply the rules of RDA. The Original RDA Toolkit instructs catalogers to capture a statement about a resource's publication, production, manufacture, or distribution. It permits implementers to record that information as a single, unbroken statement. That output matches the BIBFRAME model, but there is also a desire – commonly embraced no doubt – of interoperability and crosswalking with MARC. But MARC, as currently defined, does not permit this particular RDA pattern.

A new $s in 264 is proposed as a way to accommodate this desire. Defined as a place to record a "full, unsubfielded statement", it is entirely in keeping not only with the spirit of field 264 but also the intent. It ensures that this vital information remains in a common MARC bibliographic field, and not elsewhere in the record. It is also the smallest change that could be introduced to the format. All other options – redefining fields; eliminating punctuation – are far more drastic. There is no clear evidence that the addition of this subfield will greatly impact indexing, searching, and retrieval, especially once systems are modified to index a single additional subfield.

3. PROPOSED CHANGES

In field 264 (Production, Publication, Manufacture, and Copyright Notice) of the MARC 21 Bibliographic Format, add and define the following new subfield:

$s – General statement (NR)
Full, unsubfielded statement. This may be used in addition to other subfields or instead. While not technically incorrect, it should not be used when the second indicator value is "4" (Copyright notice date).

4. EXAMPLES

Example 1:
Three versions of the same field:

264 #1 $a[Place of publication not identified] : $bABC Publishers, $c2009.

264 #1 $s[Place of publication not identified] : ABC Publishers, 2009.

264 #1 $a[Place of publication not identified] : $bABC Publishers, $c2009. $s[Place of publication not identified] : ABC Publishers, 2009.

Example 2:

264 #1 $31990-2005 :$sWeston, MA : Prime National Pub. Corp.

Example 3:

264 31 $32006- :$sThousand Oaks, Calif. : Sage Publications

Example 4:

264 #1 $sLondon ; New York : Applied Science Publishers ; New York, N.Y. : Sole distributor in USA and Canada, Elsevier Science Pub. Co., c1983.

Example 5:

264 #1 $sLondon, Batsford; New York, Drake Publishers [1972]

Example 6:

264 #1 $6880-04$sXianggang : Hai feng chu ban she, 1991.
880 #1 $6264-04$s香港 : 海峰出版社, 1991.

Example 7:
NB: The issue here is not how nicely the data may look with the punctuation removed, it is it the downstream impact of this change that is the problem.

Original Punctuation
264 #1 $aLondon, $bBatsford; $aNew York, $bDrake Publishers $c[1972]

All Punctuation Removed, but maintain original subfielding.
264 #1 $aLondon$bBatsford$aNew York$bDrake Publishers$c1972

Example 8:
NB: The issue here is not how nicely the data may look with the punctuation removed, it is it the downstream impact of this change that is the problem.

Original Punctuation
264 #1 $aLondon ; $aNew York : $bApplied Science Publishers ; $aNew York, N.Y. : $bSole distributor in USA and Canada, Elsevier Science Pub. Co., $cc1983.

Terminal Punctuation Removed from each subfield, but maintain original subfielding.
264 #1 $aLondon$aNew York$bApplied Science Publishers$aNew York, N.Y.$bSole distributor in USA and Canada, Elsevier Science Pub. Co.$cc1983

Example 9:
$s in 264; Added entries for 264 entities.

264 #1 $sLondon : Kuperard ; New York : Distributed in the United States and Canada by Random House, 2006.
751 ## $aLondon (England) $4pbl $0http://id.loc.gov/authorities/names/n79005665
710 2# $aKuperard (Publisher) $4pup $0http://id.loc.gov/authorities/names/n2012045773
751 ## $aNew York (N.Y.) $4dbp $0http://id.loc.gov/authorities/names/n79007751
710 2# $aRandom House (Firm) $4dst $0http://id.loc.gov/authorities/names/n80051803

5. BIBFRAME DISCUSSION

This information is collectively referred to as Provision Activity in BIBFRAME. BIBFRAME would be able to accommodate the information entered in the $s by employing a simple literal property, likely one for each type of activity.

6. QUESTIONS FOR DISCUSSION

6.1. Do you agree there is an acceptable case to employ a single subfield for imprint/provision?

6.2. Does the proposed solution meet the needs discussed?

6.3. Other than those covered in this discussion paper, what other downstream impacts should be considered with respect to adding subfield $s to the 264?

6.4. Are there any other potential consequences that this paper does not address?

6.5. Would it be worth revisiting the definition and intended use of Field 881?

6.6. Should a subfield for the General Statement be defined in field 260 as well?

6.7. Are there any alternative format solutions to what is being proposed?

HOME >> MARC Development >> Discussion Paper List

The Library of Congress >> Especially for Librarians and Archivists >> Standards
(05/17/2023)

Legal | External Link Disclaimer