By Aysa Ekanger, licensed under CC BY 4.0

Last updated 2022-12-13

Creative Commons (CC) licenses allow users to freely re-distribute the licensed material, and most of the licenses allow users to modify and build on the material. All of the CC licenses contain the Attribution condition that has to be satisfied when users share the material, including in modified form.

The Attribution (BY) icon from Creative Commons The Attribution (BY) icon from Creative Commons

The Attribution condition in CC licenses seems to be easy to understand: credit the author – and that’s it, right? Wrong

Researchers are used to acknowledging other authors, and there are many disciplinary norms and many reference styles – but it is necessary to remember that the Attribution condition in CC licenses contains a range of obligatory elements that cannot be omitted. When you write an article and refer to a published dataset, you can resort to the reference norms of your choice; note that a dataset citation needs to include elements that a paper citation doesn’t (see information on how to cite datasets in the data citation section of the course). If you re-distribute someone else’s CC licensed dataset – in whole or in part, in original or modified form – additional requirements are imposed by the Attribution condition of CC licenses.

Keep in mind that CC licenses apply to copyright and (from version 4.0) to sui generis database rights, which means that the Attribution condition needs to be observed only in the kinds of reuse restricted by these rights. If your reuse is covered by exceptions and limitations to copyright, or does not involve extraction, copying and re-distribution of (un)modified parts of the licensed material, you are not obliged to satisfy the Attribution condition. If the licensed material has entered the Public Domain, you do not have to adhere to license conditions.

We’ve heard researchers say that they give credit by providing a list of DOIs for the source materials of their dataset. This is not proper attribution as defined by CC licenses and the associated sharing of the derived dataset would constitute a violation of the CC licenses’ Attribution condition.

What must be included in the attribution statement

So, you have found material that is licensed with a CC license and want to include this material in your dataset – how do you satisfy the Attribution condition? In the 4.0 version of CC licenses, Attribution consists of the following: 

  • Appropriate credit, which consists of the following elements, if supplied:
    • the name of the creator and attribution parties
    • a copyright notice
    • a license notice
    • a disclaimer notice
    • a link to the material
  • A link to the license
  • Indication of whether changes were made

Some CC license versions prior to 4.0 have slightly different content for the Attribution condition. An attribution condition in other licenses (not from CC) may also have different content. In order to comply with the conditions of a license, you must consult the content of the appropriate license or license version.

Licensors cannot modify the elements of the Attribution condition – and still identify the license as a CC license. Any additional credit elements (such as where the licensed dataset was published, or an instruction on where exactly in the derived work to place the attribution) must be formulated by the dataset’s rights holder as a request, not as a requirement. See more at https://creativecommons.org/faq/#alterations-and-additions-to-the-license.

One more important thing to remember is that, when providing attribution, users cannot imply endorsement by the creator of the original work – unless they have received an implicit permission from the creator to do so.

Where to attribute

Where do you put all this information? The Attribution condition in the 4.0 version of the CC licenses can be satisfied in 'any reasonable manner' – and what is considered 'reasonable' will differ in accordance with how, where and in what form you share the licensed material. A txt file (Readme file or License file) that will accompany your derived dataset in a data repository will be able to include all of the information that is part of the CC Attribution condition.

Of course, if you build your dataset based on multiple sources/datasets that all have an attribution condition, it will be a lot of work to include the necessary details of all those sources into your Readme file – but it is work that must be done, otherwise (unless you can satisfy the Attribution condition in any other reasonable manner) you will be violating the license conditions of your source material.

The Attribution condition in CC licenses, community attribution norms and the moral right of attribution

The Attribution condition in CC licenses (and its equivalent in other licenses) is not the only mechanism that ensures acknowledgement of sources – there are also community norms and copyright requirements.

The importance of giving proper credit and the unacceptability of plagiarism is emphasised in numerous documents and guidelines on research integrity/ethics, such as the

European Code of Conduct for Research Integrity, Principles of Transparency and Best Practice in Scholarly Publishingand various country-specific and discipline-specific guidelines. What constitutes ‘proper credit’ is based on ‘good reference practice’ and may vary between disciplines, types of research output and countries.

The Attribution condition in CC licenses is a more rigid mechanism than community norms, as it dictates what elements must be included in the attribution (see previous section).

Attribution may also be required by copyright law: most countries of the world grant creators based in those countries the moral right of attribution – the right to be acknowledged as the author of a work in accordance with good practice. It varies from country to country what types of intellectual output are protected by the moral right of attribution, and various types of research data may or not be protected by this right. In the European Union, databases are generally not considered creative works and are thus not protected by the moral right of attribution (creatively compiled databases are), computer programs are considered creative works, and for text, images and sound, sometimes a legal assessment must be made of whether they can be considered creative works.

The Attribution condition in CC licenses does not depend on the type of research data – reusers of a CC-licensed dataset have to observe the Attribution condition regardless of whether this dataset is protected by the moral right of attribution or not. The Attribution condition in CC licenses must also be adhered to in those jurisdictions that do not recognise the moral right of attribution.

The Attribution condition in CC licenses does not depend on the type of research data – re-users of a CC-licensed dataset have to observe the Attribution condition regardless of whether this dataset is protected by the moral right of attribution or not. The Attribution condition in CC licenses must also be adhered to in those jurisdictions that do not recognize the moral right of attribution.

Attribution stacking

Stacking illustrated. “Human pyramid formed by members of the Ebenezer Gym Club”, ca. 1920–1930. Made available by State Library Victoria at http://handle.slv.vic.gov.au/10381/24949. Free of known copyright restrictions.

Photo: “Human pyramid formed by members of the Ebenezer Gym Club” by State Library Victoria is free of known copyright restrictions

“Datasets are particularly prone to attribution stacking, where a derivative work must acknowledge all contributors to each work from which it is derived, no matter how distantly. If a dataset is at the end of a long chain of derivations, or if large teams of contributors were involved, the list of credits might well be considered too unwieldy. The problem is magnified if different sets of contributors have to be credited in a different way, especially if automated methods are used to assemble the dataset – some of the benefits of automation are lost if attribution conditions have to be inspected manually.” (Ball 2014, p. 4)

The Attribution condition – and the possibility of attribution stacking – is something to think about when you choose a CC license for your own dataset.

You may consider using the CC0 Public Domain Dedication tool, which allows rights holders to share their datasets without any conditions. The CC0 tool is recommended for research data, as it does not impose any conditions and thus allows for maximum reuse of data. Reusers of material that is shared under CC0 are not legally bound by the tool to credit the 'author' (dataset rights holder), but they are expected to do so according to community norms (and possibly copyright requirements, such as the moral right of attribution). Community norms are more flexible than an attribution condition in a license, and they do not result in attribution stacking.


Lessons learned

If you reuse CC-licensed sources in your own dataset, remember to attribute properly – following the requirements of the license and retaining the information supplied by the creator of the original source.

When you share your own research data, consider using the CC0 Public Domain Dedication tool.


References


Last modified: Tuesday, 13 December 2022, 8:55 AM