Personal tools

News Archive

View all »

WDS-SC Opinion Piece: Data Management 101 for ECRs

We would like to draw your attention to the following Opinion Piece from the WDS Scientific Committee (WDS-SC) that was recently published on the WDS Blog . It looks at the fundamental data-related skills that the WDS-SC believes are essential to Early Career Researchers (ECRs) throughout the course of their careers. It is also available for download as a PDF . What Every Early ...

First CoreTrustSeal-certified WDS Regular Members in Biomedical Field!

First CoreTrustSeal-certified WDS Regular Members in Biomedical Field!

We are very pleased to announce that  ImmPort Repository and the Worldwide Protein Data Bank (wwPDB) have become the first WDS Regular Members to be certified under the new CoreTrustSeal core certification standard . Moreover, they are the first two WDS Members in the area of biomedical data, thus helping the ICSU World Data System to advance its truly multidisciplinary mission . ...

WDS Data Stewardship Award 2018: Call for Nominations Open!

WDS Data Stewardship Award 2018: Call for Nominations Open!

The Call for Nominations for the 2018 WDS Data Stewardship Award is now open until 21 May 2018. This annual prize celebrates the exceptional contributions of early career researchers to the improvement of scientific data stewardship through their (1) engagement with the community, (2) academic achievements, and (3) innovations. The winner will be presented with their Award and a prize in ...

WDS Blog

View all »

What Every Early Career Researcher Should Know about Research Data Management

A WDS-SC Opinion Piece by Alex de Sherbinin, Elaine Faustman, & Rorie Edmunds


(Download as PDF)


The Scientific Committee of the ICSU World Data System (WDS-SC) believes that all Early Career Researchers (ECRs) require a basic set of data-related skills. The following presents essential areas of Research Data Management (RDM) that are relevant to budding scientists and that cover the range of issues they are likely to encounter as they collect, analyze, and manage data over the course of their careers. It has been formulated with the assumption that ECRs play an important role for future data sharing, and must take an interest in data stewardship and best practices in data management, including how to make data openly accessible and reusable.

The Essentials of RDM

Open Data. Almost all science funding agencies require that research results, including data, be made publicly available. Journals, too, are requesting that authors of scientific articles post their data, and even the code used to generate results. Data sharing and open data are important to the advancement of science, and data reuse has resulted in important scientific discoveries. ECRs need to be familiar with the FAIR principles—that data need to be Findable, Accessible, Interoperable, and Reusable—and work towards data sharing and research transparency in their own work.

Big Data. The term ‘Big Data’ arose to describe the Volume, Variety, and Velocity (the three Vs) of data being generated almost continuously by a range of sciences, from Biomedical to Earth Sciences. An ECR should have an understanding of what is meant by Big Data, and how they are increasingly important to a variety of scientific fields. Familiarity with tools and approaches to analyzing Big Data is also an important requisite for career advancement.

Definitions and Jargon. An ECR must know some of the terminology in the data arena, such as ‘ontologies’, ‘informatics’, ‘metadata’, and ‘knowledge networks’. A critical element for data sharing is common definitions, and particular attention should be paid to understanding ontologies, thesauri, and controlled vocabularies: what ontologies are, where to find them, and how to create them, as well as ways for integrating ontologies and using them to support metadata and data disambiguation efforts.

Funder Requirements and Writing Data Management Plans (DMPs). Funders increasingly require that scientists articulate through a DMP how they will ensure the open availability of their data for the long term at the onset of a project. An ECR should know how to thoughtfully prepare a DMP that will also increase the odds of them obtaining funding. Awareness of the domain-specific data repositories where their data may be archived is also important (see below). A conceptually ideal DMP is extensible, interoperable, and machine readable, and an ECR must understand why these aspects are needed and how to address them.

Data Organization and Storage. Organization and long-term preservation of data is an increasingly daunting task. An ECR should know methods of sustainability to ensure the continuance of databases as they begin to generate data. Documenting versioning, choice of technology and standards, and archiving also need to be understood. The principle that data have several end uses throughout their lifecycle—each with its own requirements—is fundamental within this, and the concepts of ‘Analysis-ready’ and ‘Publication-ready’ (data with quality assurance, citation, and metadata) data should be familiar to an ECR.

Metadata Formats, Usage and Data Discovery. Metadata are critical for data discovery and reuse, and are the bread-and-butter of catalogue services. Metadata standards are strongly format and discipline dependent, but common elements are increasingly captured in efforts by DataCite, DCAT, and others. The International Organization for Standardization (ISO) has also developed a number of domain specific standards, such as ISO-19115 for geospatial information. An ECR should recognize the importance of proper metadata development, and be aware of a number of the standards that are available.

Data Documentation. To be of use to other researchers, data need to be carefully documented: to describe how they were developed, their limitations, and to what use they may be put. Incomplete and cursory documentation often renders data unfit for future use. An ECR should have knowledge of the different approaches taken to data documentation in various fields of science, as well as of the increasingly important practice of properly referencing protocols, methods, and samples.

Data Formats and Interoperability. Data formats and applicable standards for data and metadata are largely dependent on the scientific discipline and the type of software used. There are data formats that are common across disciplines, but this is not the norm. An ECR should support open formats and well-entrenched standardized services (e.g., CSV files, DDI, OGC services, and OPENDaP, to name a few), and having an overview of their scope is a useful starting point for an ECR to make appropriate choices. For a discussion of data standards and interoperability in the health domain, visit AHIMA.

Choosing a Long-term Repository. An ECR must have an understanding of not only which disciplinary repositories are best suited to the domain in which they are working, but also the ‘trustworthiness’ of these data repositories, and how this is underpinned by a hierarchy of certification standards (e.g., the CoreTrustSeal). By examining the strengths and weaknesses of different repositories in terms of data access, documentation, and so on, it helps an ECR to conceptualize what makes for a successful data service.

Standardization, Licences, and Intellectual Property Rights. To aid in their reuse, data should ideally be made available in standardized schema and using standardized services. Each ‘data family’ has its own set of such standards, and an ECR should know which are relevant to their discipline. Moreover, with Open Data an increasing norm in the scientific community (see above), an ECR should be aware of the different types of licensing and copyright arrangements under which data are often disseminated, in addition to the importance of machine-readable licensing arrangements.

Data Ethics. While primarily salient for ECRs working in the Health Sciences, Social Sciences, and Humanities, ethical issues that arise throughout the data management lifecycle should be a topic of broad interest to all researchers likely to engage with disclosive data (e.g., research on rare biodiversity, where there may be commercial interests in their exploitation). Areas that an ECR should have knowledge about include, but are not limited to: data ownership and stewardship, handling sensitive data, consent, privacy and confidentiality, reconciling ethical and legal norms impacting data sharing and exportation, constructing equitable partnerships and data sharing agreements, and navigating the complexity of ethics review.

Data Publication, Citation, and Persistent Identifiers. An increasing number of data journals, such as the Nature Group’s Scientific Data, are now available for the publication of datasets. In addition, proper citation of data using persistent identifiers is becoming the norm in the scientific community. An ECR should be aware of the approaches to data publication and citation and the importance of doing these properly.

Research Translation and Societal Benefits. To facilitate use of data collected and stored within archives, an increasingly wide range of software has been developed for decision analytics and support. In addition, there is a great deal of work on integrating data across disciplines to support new discoveries. An ECR should understand the value that well-curated and sustained data management provides to the scientific community and larger society, and have some understanding of data indicators, decision-analysis techniques; and the graphical interfaces that can simplify exchanges. Linked ontologies and robust metadata can facilitate these possibilities.

Citizen Science and Crowdsourced Scientific Data. Citizen science and crowdsourced data have already proven to be of tremendous scientific value. However, the modest budgets of these initiatives typically mean that systems are lacking for the curation and long-term stewardship of their data. An ECR should know what citizen science is, and how to design an initiative that engages citizens in improving scientific data collection and use: addressing issues of data stewardship, validation, confidentiality, dissemination, and licensing from the beginning. SciStarter provides a good introduction to Citizen Science, and an example of pointers for the design of citizen science can be found at the Cornell Lab of Ornithology.

When the Shift from Analogue to Digital Data Occurred: A Case in Geomagnetic Data Services

Toshihiko IyemoriA Blog post by Toshihiko Iyemori (WDS Scientific Committee Member)

The World Data Centre for Geomagnetism, Kyoto (WDS Regular Member) has been collecting worldwide geomagnetic observation data under the ICSU World Data Centre (WDC) system/World Data System since 1957, collaborating with other WDCs for Geomagnetism in the United States (World Data Service for Geophysics), Russia (WDC - Solar-Terrestrial Physics, Moscow), the United Kingdom (WDC - Geomagnetism, Edinburgh), Denmark (WDC - Geomagnetism, Copenhagen), and India (WDC - Geomagnetism, Mumbai).

Figure 1 shows the number of observatories from which we keep data in analogue and digital forms.

Figure 1.

Optical recording on photo paper was originally used for most analogue recording.The digital recording of the data observed by modern electronic magnetometers started to increase from around 1980, and in 1992, finally overtook analogue recording. In 2000, the number of analogue stations had decreased to less than 10% of the total, and now all data are provided in digital form. In mid-1990s, the Internet and World Wide Web became popular, with WDC - Geomagnetism, Kyoto starting its web service in 1995.

The WDCs for Geomagnetism have been exchanging among themselves the data collected at each data centre for 60 years. During the analogue data era, it took money and manpower to collect data from distant observatories and copy them onto microfilms; and the big data centres such as WDC-A in Boulder (now World Data Service for Geophysics) or WDC-B in Moscow (now WDC - Solar-Terrestrial Physics, Moscow) mainly collected the data and distributed them to the other smaller data centres. After shifting to the digital data and Internet era, the situation changed. Collecting data via the Internet is much easier than collecting photo papers from distant stations, and international collaboration is also much easier than before.

Nowadays, more than half of geomagnetic data are provided through an international consortium, INTERMAGNET (WDS Network Member). The transition from analogue to digital recording thus also changed the main player in the provision of geomagnetic data services.

DOIs and Licensing for Geomagnetic Data & Products: Current Status

Aude ChambodutA Blog post by Aude Chambodut (WDS Scientific Committee Member)

The longest time series of geomagnetic data are certainly the ones acquired by magnetic observatories (Fig. 1), some of which reach a century of uninterrupted measurements.

There are currently about 200 open magnetic observatories worldwide. In each of them, absolute vector observations of the Earth's magnetic field are recorded accurately and continuously, with a time resolution of one minute or less, over a long period of time. Magnetic observatory data are 'primary data' that are extensively used in the derivation of data products ('secondary data') such as: International Geomagnetic Reference Field models, geomagnetic indices, space weather applications…

Figure 1: Paris declination series: annual means of declination corrected and adjusted to actual French National Magnetic Observatory - CLF (Mandea and LeMouël, 2016).

Figure 1. Paris declination series: annual means of declination corrected and adjusted to actual French National Magnetic Observatory - CLF (Mandea and LeMouël, 2016).

The whole community of geomagnetic observatories is particularly well organized and federated under the auspices of the International Association of Geomagnetism and Aeronomy (IAGA) one of the associations of the International Union of Geodesy and Geophysics [WDS Partner Member].

Since the beginning of the 1960’s (the birth of the World Data Centre system established in 1957 provided archives for the observational data resulting from the 'International Geophysical Year'), magnetic observatories data have been mostly publicly available (Fig. 2). Getting access to a network of stations is much more interesting than having access to just one isolated observatory.

Figure 2: Location of Magnetic Observatories (all periods) which are having at least one data ingested into the Geomagnetism Data Portal of WDC for Geomagnetism (Edinburgh, UK - ).

Figure 2. Location of magnetic observatories (all periods) having at least one datum ingested into the Geomagnetism Data Portal of WDC – Geomagnetism, Edinburgh [WDS Regular Member].

The cooperative spirit within the geomagnetic community thus knows a fairly long-standing history that has had to cope with the successive technological revolutions regarding data recording (e.g., analogic to numeric; Fig. 3), but also regarding the way data are made available (from yearly books, via isolated recording supports, up to connected data repositories). In this regard, the community had practices based on fair-play and goodwill recognition of data sources/providers. Such practices worked, and would have worked for many more decades without new challenges to meet the changing requirements of users and stakeholders.

Indeed, in our increasingly connected world, it is evermore important to closely follow evolution regarding data management. Some aspects were previously not sufficiently taken into account, such as the discovery, citation, and reuse of the geomagnetic data. Nowadays, it appears no longer possible to keep sources of data for only 'informed people', and the existing licensing conditions for distribution of geomagnetic data and data products are (in part) not adequately elaborated to address this change and need to be improved.

Figure 3: Analog magnetogram of Vladivostok (VLA) 24th September 1934 (with ICSU grant-2003 at WDC for Solar-Terrestrial Physics, Moscow, Russia -

Figure 3. Analogue magnetogram from Vladivostok; 24 September 1934 (through ICSU grant-2003 by WDC – Solar–Terrestrial Physics, Moscow [WDS Regular Member]).

IAGA has thus agreed to set up Task Forces on the abovementioned aspects, with a consensus already found when it comes to the aims of data/ data-product licensing and Digital Object Identifier (DOI) minting to:

 – Provide recognition and acknowledgement.
 – Enable creation of new data products from primary data (e.g., geomagnetic indices) or in combination with other data sources (e.g., global models of geomagnetic field).
 – Prevent the change and/or appropriation of data by a third party.
 – Enable reuse of data in a reproducible way.
 – Supply metadata that enable unique identification of a dataset, as well as providing relevant information to the user.
 – Use machine-readable and widely used licenses. 
 – Enable easy online access to research data for discovery.

The work is in progress such that it meets the state-of-art when it comes to applying licenses and minting DOI for geomagnetic data and data products, with the goal to ensure the availability into the 21st century of the tremendous efforts achieved by generations of observers in geomagnetism throughout the world.


View all »

CESSDA Expert Tour Guide for Data Management

Over the past year, research data management experts from eleven organizations within the Consortium of European Social Science Data Archives (CESSDA)—including, DANS and the Swedish National Data Service (WDS Regular Members)—have combined forces to create a detailed and thorough guide on data management across the research data life cycle. The Expert Tour Guide introduces the concepts ...

Global Glacier Change Bulletin No. 2 (2014–2015)

Global Glacier Change Bulletin No. 2 (2014–2015)

The  World Glacier Monitoring Service  (WGMS; WDS Regular Member) has announced the publication of the second issue of the Global Glacier Change Bulletin series which provides an integrative assessment of worldwide and regional glacier changes at two-year intervals. It serves as an authoritative source of illustrated and commentated information on global glacier changes based on the latest ...

New SEDAC Dataset

New SEDAC Dataset

A new dataset has been released by the NASA  Socioeconomic Data and Applications Center (SEDAC; WDS Regular Member).  India Annual Winter Cropped Area, 2001–2016 consists of annual winter cropped areas for most of India (excluding the northeastern states) from 2000–2001 to 2015–2016. The data can be used in land-cover and land-use change studies, agricultural applications, and to assist with ...


View all »

WDC – RSER Transfers Data Holdings to WDC – Meterology, Obninsk

The All-Russia Research Institute of Hydrometeorological Information – World Data Centre (RIHMI-WDC) has announced to the  WDS Scientific Committee  (WDS-SC) that it has discontinued the existence of WDC – Rockets, Satellites and Earth Rotation (WDC – RSER) since the topics are no longer its priorities. However, the WDS-SC is extremely pleased to learn that the data holdings of WDC – RSER will ...

Integration of the Ukrainian science into the World Data System

Zgurovsky et al. in Cybernetics and Systems Analysis (Volume 46, Issue 2). Abstract: Creating the World Data Center for Geoinformatics and Sustainable Development (WDC-Ukraine), its certification and integration into the World Data System are described. The main principles of the WDC and its research priorities are considered. Main projects carried out by the WDC are reviewed. One of them is ...

Collaboration between ICSU World Data System and SCOSTEP/VarSITI

Takashi Watanabe and Rorie Edmunds in VarSITI Newsletter, Volume 3. The International Council for Science (ICSU) has a long history of collaborating internationally on the archiving and provision of scientific data. The World Data Centres (WDCs) and the Federation of Geophysical and Astrophysical Data Services were established by ICSU during the International Geophysical Year (IGY). Building ...

Press Releases

View all »