Data set: Difference between revisions

From PANGAEA Wiki
Jump to navigation Jump to search
Line 44: Line 44:
* '''Configuration''' window lists [[geocode]]s (highlighted in blue), related metainformation (highlighted in yellow) and all [[parameter]]s used in the order of the available data set. The configuration lists shows the data set label, parameter name, unit, format, parameter ID, PI, Method, Comment and no. items. Mark the ''Edit in list'' check-box to edit these items.
* '''Configuration''' window lists [[geocode]]s (highlighted in blue), related metainformation (highlighted in yellow) and all [[parameter]]s used in the order of the available data set. The configuration lists shows the data set label, parameter name, unit, format, parameter ID, PI, Method, Comment and no. items. Mark the ''Edit in list'' check-box to edit these items.
** '''Format''' will show the number of digits before and after the decimal point of a numeric parameter if selected in the configuration window by a mouse click. Different formats can be selected from the pop-up menue or changed by hand. If the geocode Date/Time is selected, different types of ISO formats can be selected, depending on the required precision. Also the exponential-format is supported, e.g. {{doi|10.1594/PANGAEA.705156}}
** '''Format''' will show the number of digits before and after the decimal point of a numeric parameter if selected in the configuration window by a mouse click. Different formats can be selected from the pop-up menue or changed by hand. If the geocode Date/Time is selected, different types of ISO formats can be selected, depending on the required precision. Also the exponential-format is supported, e.g. {{doi|10.1594/PANGAEA.705156}}
** ''PI'' and ''Method'' are relational fields and can be set via the button ''choices'' from the relational lists
** '''PI''' and '''Method''' are relational fields and can be set via the button ''choices'' from the relational lists
** ''Parameter'' can be changed in the same way via the ''choices'' button. '''Be carefull, never exchange a text parameter by a numerical parameter or vice versa!''' ''Unit'' and ''Param.ID'' are linked with the parameter and cannot be changed.
** '''Parameter''' can be changed in the same way via the ''choices'' button. '''Be carefull, never exchange a text parameter by a numerical parameter or vice versa!''' ''Unit'' and ''Param.ID'' are linked with the parameter and cannot be changed.
** ''Comment'' is a free text field
** '''Comment''' is a free text field


==References ==
==References ==

Revision as of 2014-12-17T09:55:00

Fig. 1 Window of data set with ID: 837747, imported by data curator Stefanie Schumacher
Fig. 2 Largest data set with 22 Mio Points doi:10.1594/PANGAEA.758918

A data set is a collection of data (often from one event) in a scientific context organized in one matrix. Data in Pangaea are organized in predefined data sets which are quite similar to the original files uploaded and exported from the archive.

The granularity of a data set depends on the type of data, the number of data points and is primarily in the decision of the data author. In principle a Pangaea data set can have an unlimited number of columns and lines (excel 2003: 65,536 x 256; excel 2008: >1 Mio x 16,384) - Examples:

A data set may contain one to many data series. Two to many data sets may be grouped to one parent set. Access rights can be defined for a complete data set only. Each data set consists of the data accompanied by metadata according to ISO standard fields (ISO 19115). A data set appears on the Internet with a metaheader which contains the information as described below.


Opening a data set in 4D will show the frame title and five tabs named Basic, Config, References, Details and Coverage with metadata fields as described below:

Frame title

The frame title of the data set window shows its ID and the responsible curator (s. Fig. 1). Below are buttons to <Save> and <Delete> the data set or to open it via its URL in a browser window <Open URL>. The button <Parent ID:xxxxxx> is offered when the data set is part of a parent set and allows to open the corresponding parent set. Checking the with cache new box will force a complete update in the data file cache when pressing <Save>.

Basics tab (Fig. 1)

  • Author(s) of the data set; one to many authors may be added by a multiple choice list related to the table Staff
  • Year, automaticaly set but can be changed
  • Title of the data set as free text; equivalent to the title of a publication; in case the data set is a table from a publication, the title should be the table caption, headed by the table numer, e.g. doi:10.1594/PANGAEA.693592
  • Source may contain the institution of the data origin and is relational to the table Institution; use only if data are not related to a reference.
  • Status of the data set with
    • pop-up menue with choices: questionable, not validated, validated, published, published & citable
    • Registry: gives information about the registration process:
      • not to be registered if status is not published
      • registration is in the lead time for four weeks after setting the data set to status published and final editing
      • registered as the final status
  • Protection of the data set with
    • pop up menue with choices: unrestricted, signup required, access rights needed
      • unrestricted open access data sets (default)
      • signup required for e.g., BSRN datasets
      • access rights needed data sets under moratorium. Login required may be checked for sets with status published but should still be protected (on request of the PI)
    • Access rights button to set individual access to data sets with access rights needed
  • License: pop-up menue with different cc creative commons license, CC-By by default, see Creative Commons Attribution-Noncommercial. 88x31.png
  • Keywords is relational to the Thesaurus and can be set individually, different types of keywords are available
  • Project(s) allows via a mutiple choice list to add one to many projects as provided by the Project table
  • Event(s) as used in the data set (can not be changed)

Config tab

Fig. 3 Window of data set, Config tab
  • Data series window shows parameters used in the data set. The list contains data series label, parameter name, unit and original format. The button Add/Remove allows to add or remove data series to the data set. Data series must be deleted from the configuration before they can be removed.
  • Geocodes window lists the used geocodes. A doubleclick brings a geocode in the data set configuration.
  • Related metainformation window contains fields from the event table which can be added to the Configuration.
  • Configuration window lists geocodes (highlighted in blue), related metainformation (highlighted in yellow) and all parameters used in the order of the available data set. The configuration lists shows the data set label, parameter name, unit, format, parameter ID, PI, Method, Comment and no. items. Mark the Edit in list check-box to edit these items.
    • Format will show the number of digits before and after the decimal point of a numeric parameter if selected in the configuration window by a mouse click. Different formats can be selected from the pop-up menue or changed by hand. If the geocode Date/Time is selected, different types of ISO formats can be selected, depending on the required precision. Also the exponential-format is supported, e.g. doi:10.1594/PANGAEA.705156
    • PI and Method are relational fields and can be set via the button choices from the relational lists
    • Parameter can be changed in the same way via the choices button. Be carefull, never exchange a text parameter by a numerical parameter or vice versa! Unit and Param.ID are linked with the parameter and cannot be changed.
    • Comment is a free text field

References

  • Reference(s) opens a multiple choice list related to the Reference table to select one to many papers relevant to the data set



Details tab

  • Citation as assembled from the fields on the Basics tab.
  • Comment to add individual comments as plain text; field size up to 32 kbyte. URIs might be included and will be resolvable in the metaheader (example doi:10.1594/PANGAEA.552514 ).
  • Keywords related to the Thesaurus; may be used to group sets by a keyword
  • Spatial coverage: fields showing min/max of the three spatial dimensions of the data set
  • Temporal coverage: min/max of Date/Time or Age
  • Topologic type is used to define the extension of a data set
  • Created: date/time or import; Updated: date/time of last change
  • Size of the data set

Web tab

  • Citation as assembled from the fields on the Basics tab.
  • URL as defined by the system for event-related data sets or as defined by the user for static links to files.
  • URL Data details to link files containing an extended description of the data set; the linked files should be *.txt for simple text or *.pdf if a text layout is needed. Field must contain a valid URI only and appears in the metaheader of a data set only if filled. (see URL comment on discussion page)
  • Export filename contains the data sets name if downloaded as text file to the users PC. The extension *.tab is added automaticaly.
    • Filenames usualy start with the event, followed by a specification of the content (e.g. M24_3-5_sedimentology).
    • File names of supplements start with the first authors name followed by the year, equivalent to the citation of references in a publication text; e.g. Smith_1998, Smith-Sandwell_1987 or Smith-etal_2007.
  • URL other version is an URI field to link to data sets of newer/older versions or any original source or other formats on the Internet. Field must contain a valid URI only and appears in the metaheader of a data set only if filled.
    • If a data set is deleted and substituted by a new version, this field in the deleted set should contain the DOI of the new version. In deleted sets only title and DOI will remain and the user if informed that the data set was substituted by an other version given by its new DOI: doi:10.1594/PANGAEA.80967 (this is only a test)
    • If a data set is requested which was deleted prior to registration or which has never been existed see e.g. doi:10.1594/PANGAEA.49999
    • If a data set is requested which was deleted and missing a new version see e.g. doi:10.1594/PANGAEA.80965

locked data set or other entries

When opening a window to edit details of a record, the record is locked and is not available for other users and also not for background processes. As any updated records are processed sequentially in the background queue, this might cause updates not showing up on the web. Close windows if editing is done!