The Library of Congress >> Especially for Librarians and Archivists >> Standards

MARC Standards

HOME >> MARC Development >> Proposals List


MARC PROPOSAL NO. 2023-01

DATE: December 21, 2022
REVISED:

NAME: Defining a New Field to Record Electronic Archive Location and Access in the MARC 21 Formats

SOURCE: ISSN International Centre, Paris, and the National Library of Finland

SUMMARY: This proposal describes defining a new field 857 (Electronic Archive Location and Access) to enable libraries to specify a persistent identifier or location of the resource in a digital archive repository or Web archive, and record the name, date ranges, and completeness of relevant archived content.

KEYWORDS: Field 857 (All formats); Electronic archive location and access (All formats); Field 856 (All formats); Electronic location and access (All formats); Access to online information resources (All formats)

RELATED: 2022-08; 2022-DP06; 2022-DP02; 2020-DP01; 2022-06; 2022-07; 2018-DP11; 93-4; 97-1; 99-06; 2019-01; DP 49; DP 54; DP 69; Guidelines for the Use of Field 856, Revised August 1999;Guidelines for the Use of Field 856, Revised March 2002

STATUS/COMMENTS:
12/21/22 – Made available to the MARC community for discussion.

01/31/23 – Results of MARC Advisory Committee discussion: Approved, with the following amendments: 1) change the name of $d to "Date range of archived material"; 2) update the date examples to follow EDTF; 3) clarify $f to indicate that completeness can be indicated by either the frequency of archiving or the number of times archived as of a date that should be specified; 4) add a monograph example.

05/17/23 – Results of MARC Steering Group review - Agreed with the MAC decision.


Proposal No. 2023-01: Defining a New Field to Record Electronic Archive Location and Access

1. BACKGROUND

1.1. Previous Developments

Discussion Paper 2022-DP02 resulted in both Proposal 2022-08 and Discussion Paper 2022-DP06, written jointly by the ISSN International Centre and the National Library of Finland. Proposal 2022-08, which passed at the Annual MAC Meeting in June 2022, focused on changes to field 856:

Discussion Paper 2022-DP06 outlined the indicators and subfields required for a new field 857 for Web archive-related data elements, including the archive URL, harvesting date range, and completeness of the archived data.

MAC meeting response to 2022-DP06 paper was positive. New data elements were supported, however a clearer definition of the date range to be used in subfield $d as well as the meaning of "completeness" in subfield $f was requested. This proposal includes clarifications to these definitions and finalizes the elements needed for the creation of field 857.

1.2. Definitions

For the purposes of this proposal, the terms electronic and digital are used interchangeably. This paper addresses both digital archive repositories and Web archives, which have the same purpose but are built in a different manner, using different software and communication protocols. The ISO definitions are provided below.

Digital archive repository

Dedicated to the long-term preservation of the associated data

Note 1 to entry: The data in digital archives are also often available on-line. This highlights the need for reliable PIDs

[SOURCE: ISO 24622-1:2015(en), 2.1]

Web archive

Entire set of resources crawled from the Web over time, comprising one or more collections

[SOURCE: ISO/TR 14873:2013, 2.4]

2. DISCUSSION

New field 857 may be used when the existence of a copy of the resource has been identified in a digital archive repository or a Web archive and there is a need to record more information about the archive and its contents in a bibliographic record than is possible in field 856. Field 857 enables information such as persistent identifier, location, content date range, completeness of the archive, and the provider to be recorded in a dedicated field. A parallel structure to Field 856 is proposed, with differences needed to accommodate archives. For example, provision is made for both the name of the archiving agency (subfield $b) and the name of the Web archive or digital archive repository (subfield $c) if different than the archiving agency's name.

Subfield $d (date range) can be used in conjunction with the original publication dates, to indicate that only parts of an original manifestation have been preserved in the archive. Subfield $f (completeness) is especially useful when preservation is ongoing for a continuing resource. If the preservation data is provided by the Web archive, this can be achieved in $f by recording the number of times a resource has been harvested and the date when this number was checked, or how often a resource is harvested.

Field 857 accommodates Web archives and digital archive repositories that contain original manifestations of resources, and can help inform a migration strategy for digital preservation based on file format types and versions noted in subfield $q.

The archived resource may be available in one or more publicly accessible Web archives, but access may also be restricted, such as in the case of legal deposit collections. The terms of access and reproduction subfields introduced in 2022 for Field 856 are equally applicable here.

The various other subfields and the standard control fields in this proposal that parallel Field 856 also remain relevant for Web archives and digital archive repositories.

3. PROPOSED CHANGES

This paper proposes that a new field 857 be created and defined in the MARC 21 formats as follows:

857 - Electronic Archive Location and Access (R)

FIELD DEFINITION AND SCOPE
Information needed to locate and access an electronic resource from a Web archive or a digital archive repository. This field may be used to provide additional information about archived resources beyond what is possible in Field 856.

Field 857 is repeated when an electronic resource has been stored in more than one Web archive or digital archive repository.

Indicators

First Indicator Access method 

Access method to the electronic resource. If the resource is available by more than one access method, the field is repeated with data appropriate to each method. The methods defined are the main TCP/IP (Transmission Control Protocol/Internet Protocol) protocols. When recording a URL in subfield $u, the value corresponds to the access method (URL scheme), which is also the first element in the string.

# - No information provided

1 - FTP

Access to the electronic resource is through the File Transfer Protocol (FTP).

4 - HTTP

Access to the electronic resource is through the Hypertext Transfer Protocol (HTTP).

7 - Method specified in subfield $2

Access to the electronic resource is through a method other than the defined values and for which an identifying code is given in subfield $2 (Source of access).

Second indicatorRelationship

Relationship between an archived electronic resource at the location identified in field 857 and the resource described in the record as a whole.

# - No information provided

No information is provided about the relationship of the archived electronic resource identified in field 857 to the bibliographic resource described by the record as a whole.

0 - Resource

Record describes an archived electronic resource. Electronic location in field 857 is for the archived electronic resource described by the record as a whole. If the electronic location in field 857 is for one or more component parts of the resource represented by the record as a whole, use second indicator value 3 (Component part(s) of resource).

1 - Version of resource

Record describes a non-electronic resource, e.g., a printed book, or a non-networked electronic resource e.g., CDROM. The electronic location in field 857 is for an archived electronic version of the resource described by the record as a whole. In this case, the bibliographic record itself does not represent the networked electronic resource but an archived electronic version is available.

2 - Related resource

Electronic location in field 857 is for a networked electronic resource that has a clear, specific, and direct bibliographic relationship to the bibliographic resource described by the record as a whole but is not a component part of the resource. For resources that have only a subject relationship to the resource described, use 6XX fields. When there is uncertainty whether a resource may be a component part of the whole or a related resource, prefer second indicator value 2.

3 - Component part(s) of resource

Record describes an archived electronic resource. The electronic location in field 857 is for one or more component parts of the archived electronic resource described by the record as a whole. The component parts are portions of the resource, such as a table of contents or a sample chapter. Consider each of multiple fields 857 that add up to a complete representation of the archived electronic resource to be a component part. When appropriate, use subfield $3 to specify the component part(s) to which the field applies. When there is uncertainty whether a resource may be a component part of the whole or a related resource, use second indicator value 2 (Related resource).

4 - Version of component part(s) of resource

Record describes a non-electronic or a non-networked electronic resource. The electronic location in field 857 is for a version of one or more component parts of the archived electronic resource described by the record as a whole. The component parts are versions of portions of the resource, such as a table of contents or a sample chapter. When appropriate, use subfield $3 to specify the component part(s) to which the field applies. When there is uncertainty whether a resource may be a component part of the whole or a related resource, prefer second indicator value 2 (Related resource).

8 - No display constant generated

Subfield Codes
(Underlined elements indicate new subfields specific to field 857. Subfields with different definitions from 856 are noted.)

$b – Name of the archiving agency (NR)

Agency responsible for the Web archive or digital archive repository.

$c – Name of the Web archive or digital archive repository (NR)

The name by which the Web archive or digital archive repository is known.

$d – Archived content date range (NR)

Date range of the content of the resource in the archive, specified according to Extended Date/Time Format (EDTF). The start date of the archived content should always be mentioned; the end date only if the content described in the record is no longer being archived.

Multiple date ranges can be provided in a single 857 $d by separating them with ";". The reason for these gaps, if known, may be provided in 857 $x or 857 $z.

Note: 857 $d is intended primarily for electronic continuing resources and other dynamic resources.

$e - Data provenance (R)

See description of this subfield in Appendix J: Data Provenance Subfields.

$f – Archive completeness (NR)

Contains information about the completeness of the content made available from the Web archive or digital archive repository. Completeness may be inferred using information about how often or how many times a continuing resource has been harvested

Note: 857 $f is intended primarily for electronic continuing resources or other dynamic resources.

            857 $b….$f Complete
            857 $b….$f Saved multiple times per day
            857 $b….$f Saved 14150 times since 1997-01-01

$g - Persistent identifier (R) [Definition slightly differs from 856]

Persistent identifier (PID) which enables search and retrieval of a resource from a Web archive or digital archive repository using existing Internet protocols.

Persistent identifier assigned to the resource for automated access and other resolution services by a PID resolver. PIDs should be provided as actionable hyperlinks (e.g., HTTP URI format).

If a PID resolves to more than one URI, these URIs may be provided in the same 857 field, with repeated $u.

$h – Non-functioning Uniform Resource Identifier (R) [Definition slightly differs from 856]

Uniform Resource Identifier (URI), for any archive or repository that has ceased to exist.

Subfield $h may be repeated if there is more than one non-functioning URI. A note on the status change (including the date) may be added in either in subfield $x or subfield $z, depending on the local policy.

$l - Standardized information governing access (R)

The subfield contains standardized information about the access status of a resource. The information may be in the form of a value from a controlled vocabulary, or a Uniform Resource Identifier (URI). If the information is a value from a controlled vocabulary, it is preceded by the appropriate value from the list of Access Restriction Term Source Codes, enclosed in parentheses. When the information is given in the form of a Web retrieval protocol, e.g., HTTP URI, no preceding parenthetical is used.

This subfield is intended to contain information which is an equivalent to that recorded using subfields $f, $2 and $u of field 506.

$m - Contact for access assistance (R)

Name of a contact for assistance in accessing a resource at the host specified in subfield $b. For addresses relating to the content of the resource itself (i.e. the item represented by the title recorded in field 245) rather than access assistance, field 270 is used. If the address data is the same, use field 270.

$n - Terms governing access (R)

The subfield contains textual information about the access status of a resource.

This subfield is intended to contain information which is an equivalent to that recorded using subfield $a of field 506.

$q - Electronic format type (R)

Identification of the electronic format type and version. Electronic format type should be specified with a code from the list of registered Internet Media Types (MIME types), taken from: IANA Media Types. If necessary (e.g., in order to specify a file format version to support access or digital preservation) additional information, such as PRONOM Unique Identifier (PUID) codes, may be included in addition to the information provided by the MIME Type by repeating subfield $q.

$r - Standardized information governing use and reproduction (R)

The subfield contains standardized information about the use and reproduction rights of a resource. The information may be in the form of a value from a controlled vocabulary, or a Uniform Resource Identifier (URI). If the information is a value from a controlled vocabulary, it is preceded by the appropriate value from the list of Access Restriction Term Source Codes, enclosed in parentheses. When the information is given in the form of a Web retrieval protocol, e.g., HTTP URI, no preceding parenthetical is used.

This subfield is intended to contain information which is an equivalent to that recorded using subfields $f, $2 and $u of field 540.

$s - File size (R) [Definition slightly differs from 856]

Size of the file(s) as stored in the archive indicated in subfield $c. It is generally expressed in terms of 8-bit bytes (megabytes, gigabytes).

$t - Terms governing use and reproduction (R)

The subfield contains textual information about the use and reproduction rights of a resource.

This subfield is intended to contain information which is an equivalent to that recorded using subfield $a of field 540.

$u - Uniform Resource Identifier (R) [Definition slightly differs from 856]

Uniform Resource Identifier (URI), which provides standard syntax for locating an object using existing Internet protocols or by resolution of a persistent identifier (PID).

Subfield $u may be repeated if more than one URI is recorded.

URIs which no longer function to provide access to the described resource may be transferred to 857 $h.

$x - Nonpublic note (R)

Note relating to the electronic location of the source identified in the field. The note is written in a form that is not adequate or intended for public display. It may also contain processing information about the file at the location specified.

$y - Link text (R)

Used for display in place of the URL in subfield $u. When subfield $y is present, applications should use the contents of subfield $y as the link instead of the content of subfield $u when linking to the destination in subfield $u. The use of the link text is independent of any decision concerning the second indicator value.

$z - Public note (R)

Note relating to the electronic location of the source identified in the field. The note is written in a form that is adequate or intended for public display.

$2 - Access method (NR)

Access method when the first indicator position contains value 7. Code from: Electronic Access Methods Code List.

$3 - Materials specified (NR)

Part of the described material to which the field applies.

857 42$3 Finding aid $u//aj.sunback.homes/ammem/ead/jackson.sgm

$5 - Institution to which field applies (NR)

See description of this subfield in Appendix A: Control Subfields.

$6 - Linkage (NR)

See description of this subfield in Appendix A: Control Subfields.

$7 - Access status (NR)

Code indicating the availability of access to the networked electronic resource the address of which appears in subfield $u. Subfield $7 applies to all subfields $u present in the field.

0 - Open access
The networked electronic resource is freely and openly accessible online to everyone, without restriction, login, or payment.

1 - Restricted access
The networked electronic resource is not freely and openly accessible online.

u - Unspecified

z – Other

857 40$c HathiTrust Digital Library $uhttp://catalog.hathitrust.org/api/volumes/oclc/1654047.html$70

$8 - Field link and sequence number (R)

See description of this subfield in Appendix A: Control Subfields.

4. EXAMPLES

4.1. Web archive URL in 857 $u, precise file format specification in 857 $q


leader 03922cas a2200625 i 4500
007 cr |||||||||||
008 190529d20122018enkqr|pso | a0eng
022 0 # $a 2162-4054 $2 _1 $l 2162-4046
210 1 # $a Worm $b (Austin Tex., Online)
222 # # $a Worm $b (Austin, Tex. Online)
245 1 # $a Worm.
264 # 1 $a Austin, TX $b Landes Bioscience $c [2012]-
264 3 1 $3 <2015-> $a Abingdon $b Taylor & Francis
362 1 # $a Began with v. 1, issue 1 (Jan./Feb./Mar. 2012); ceased with Volume 6, Issue 3/4 (2017).
588 # # $a Description based on: V. 1, issue 1 (January/February/March 2012); title from issue contents page (publisher's Web site, viewed January 15, 2013).
588 # # $a Latest issue consulted: Volume 6, issue 3-4 (2017) (Taylor & Francis Online, February 2, 2018).
776 0 8 $t Worm (Austin, Tex. Print) $x 2162-4046 $h ta
856 4 0 $h http://www.landesbioscience.com/journals/worm/$u http://www.tandfonline.com/toc/kwrm20/current $u http://www.tandfonline.com/loi/kwrm20
857 40 $b Internet Archive $d 2011-2017 $f saved 45 times $q https://www.nationalarchives.gov.uk/PRONOM/fmt/96 $u https://web.archive.org/web/*/http://www.landesbioscience.com/journals/worm/

4.2. Three Web archives with different coverage and access rights

LDR 01574cai a2200469 i 4500
007 cr||||||||||||
008 090429c20089999fi kn w|o 0 | b0fin|
022 1# $a1798-1557$l0355-2047$2a
222 #0 $aHS.fi.
245 00 $aHS.fi.
246 13 $aHelsingin sanomat
260 ## $aHelsinki: $bSanoma Magazines, $c2008-
310 ## $acontinuously updated
338 ## $aonline resource $bcr $2rdacarrier
538 ## $aWorld Wide Web.
655 #7 $anewspapers $2slm/fin $0http://urn.fi/URN:NBN:fi:au:slm:s39
776 0# $tHelsingin sanomat $x0355-2047  
856 40 $uhttps://www.hs.fi/
857 40 $b National Library of Finland $c The Finnish Web Archive $d 2006-01-16- $f saved 9431 times as of 2022-03-14 $n Onsite access in legal deposit libraries $q text/html $u https://verkkoarkisto.kansalliskirjasto.fi/wayback/*/www.hs.fi
857 40 $b Internet Archive $d 2003-12-18- $f saved 21238 times as of 2022-03-15 $n Free access $u https://web.archive.org/web/*/hs.fi
857 #0 $b Center for scientific computing - CSC $c National Digital Archive $e National Library of Finland $n Dark archive; consult a librarian on the use $q text/xml $q https://aj.sunback.homes/standards/mets/ $z Preservation copy of the content in the Finnish Web Archive

4.3. Web URL in 856, Web archive URL and archived location in 857


LDR 02329cas a2200493 i 4500
007 cr
008 161228c20169999it s||pss|||||||||b0mul
022 0 # $2 _d $a 2531-9884 $l 2531-9884
210 1 # $a Comp. cult. stud. $b (Firenze)
222 # # $a Comparative cultural studies $b (Firenze)
245 1 # $a Comparative cultural studies.
246 3 1 $a Comparative Cultural Studies. European and latin american perspectives
260 3 # $a Firenze $b Firenze University Press
362 1 # $a N. 1 (2016)
500 # # $a Peer-review (fascicolo consultato: N. 1, 2016)
710 1 # $a Università degli Studi, Firenze.
720 # # $a Università degli Studi di Firenze
856 4 0 $h http://www.fupress.net/index.php/ccselap/index $q application/pdf $u https://oajournals.fupress.net/index.php/ccselap/index
857 4 0 $b CLOCKSS $d 2016- $n Dark archive $u http://www.clockss.org/clockss/Comparative_cultural_studies
857 4 0 $b Internet Archive $c FatCat $d 2021- $f selected articles $q application/pdf $u https://fatcat.wiki/container/pv7gnxzaj5eydnvan65xus4uiy

4.4. Restricted Access


leader 04030cas a2200577 i 4500
007 cr |||||||||||
008 100219c20139999ne f||p|s|||||||||a0eng |
022 0 # $a 2213-0624 $2 _j $l 2213-0624
222 # # $a International journal for history, culture and modernity $b (Online)
245 1 # $a International journal for history, culture and modernity.
246 3 3 $a HCM
260 # # $a Utrecht $b Utrecht University Department for History & Art History
260 3 # $a Leiden $b Brill
362 1 # $a Volume 1 - Issue 1 - 2013-
710 2 # $a Stichting International Journal for History, Culture and Modernity
776 0 # $t International journal for history, culture and modernity (Print) $x 2666-6529 $h ta
856 4 0 $h https://www.history-culture-modernity.org/ $q application/pdf $u https://brill.com/view/journals/hcm/hcm-overview.xml
857 4 0 $b Library of Congress $d 2013- $n Onsite access only $g http://hdl.loc.gov/loc.gdc/ejournal.021602 $u http://hdl.loc.gov/loc.gdc/ejournal.021602

4.5. Multiple archiving agencies for a digitized print journal


leader 07581cas a2201261 4500
 007 ta
008 220330c18929999nyumn|p 0 a0eng
044 # # $c USA
022 0 # $a 0042-8000 $l 0042-8000 $2 _1
222 # # $a Vogue $b (New York)
245 1 # $a Vogue.
260 # # $a [New York] $b [Condé Nast Publications, etc.]
336 # # $a text $b txt $2 rdacontent
337 # # $a unmediated $b n $2 rdamedia
338 # # $a volume $b nc $2 rdacarrier
362 0 # $a v. 1- Dec. 17, 1892-
588 # # $a Latest issue consulted: Vol. 210, no. 6 (June/July 2020).
856 4 0 $u https://www.vogue.com/magazine
856 4 1 $u http://www.proquest.com
857 4 1 $c HathiTrust Digital Library $d 1894-1922 $f incomplete $u http://catalog.hathitrust.org/api/volumes/oclc/1769261.html
857 4 1 $b Bibliothèque nationale de France $c Gallica $d 1917-1917 $f incomplete $g https://gallicaintramuros.bnf.fr/ark:/12148/cb34471903t/date
857 4 1 $b Internet Archive $d 1892-2014 $f Complete $u https://archive.org/details/pub_vogue

5. BIBFRAME DISCUSSION

The implications of these proposed changes on BIBFRAME will need to be considered in order to prevent inadvertent data loss and conversion inconsistencies.

6. SUMMARY OF PROPOSED CHANGES

Create field 857 (Electronic Archive Location and Access) (identical in the MARC Bibliographic, Authority, Holdings, Classification, and Community Information formats) with the following indicators and subfields, where those subfields in bold are newly defined, and those in regular font have the same or slightly different definitions as in field 856 (see Section 3 for a full description of the field):

857 - Electronic Archive Location and Access (R)

Indicators

First Indicator – Access method 
# - No information provided
1 - FTP
4 - HTTP
7 - Method specified in subfield $2

Second indicator – Relationship
# - No information provided
0 - Resource
1 - Version of resource
2 - Related resource
3 - Component part(s) of resource
4 - Version of component part(s) of resource
8 - No display constant generated

Subfield Codes

$b - Name of the archiving agency (NR)
$c - Name of the Web archive or digital archive repository (NR)
$d - Archived content date range (NR)
$e - Data provenance (R)
$f - Archive completeness (NR)
$g - Persistent identifier (R)
$h - Non-functioning Uniform Resource Identifier (R)
$l - Standardized information governing access (R)
$m - Contact for access assistance (R)
$n - Terms governing access (R)
$q - Electronic format type (R)
$r - Standardized information governing use and reproduction (R)
$s - File size (R)
$t - Terms governing use and reproduction (R)
$u - Uniform Resource Identifier (R)
$x - Nonpublic note (R)
$y - Link text (R)
$z - Public note (R)
$2 - Access method (NR)
$3 - Materials specified (NR)
$5 - Institution to which field applies (NR)
$6 - Linkage (NR)
$7 - Access status (NR)
$8 - Field link and sequence number (R)


HOME >> MARC Development >> Proposals List

The Library of Congress >> Especially for Librarians and Archivists >> Standards
(05/17/2023)
Legal | External Link Disclaimer Contact Us