Locale from the ORCID Public Data File

After one year from its start, the ORCID community has published the first Public Data File. I downloaded it, and had a brief look at the data. It comes both in XML and JSON, one file per ORCID record. The number of files is a little over 360,000.

Here is how my own record looks like in XML. The orcid-activities element marks the start of the list of publications, cut off here:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<orcid-message xmlns="http://www.orcid.org/ns/orcid">
    <message-version>1.0.22</message-version>
    <orcid-profile type="user">
        <orcid>0000-0002-6892-9305</orcid>
        <orcid-id>http://orcid.org/0000-0002-6892-9305</orcid-id>
        <orcid-preferences>
            <locale>en</locale>
        </orcid-preferences>
        <orcid-history>
            <creation-method>website</creation-method>
            <completion-date>2013-08-26T13:58:52.798Z</completion-date>
            <submission-date>2013-08-26T13:47:05.325Z</submission-date>
            <last-modified-date>2013-09-05T11:41:03.758Z</last-modified-date>
            <claimed>true</claimed>
        </orcid-history>
        <orcid-bio>
            <personal-details>
                <given-names>Tuija</given-names>
                <family-name>Sonkkila</family-name>
            </personal-details>
            <external-identifiers>
                <external-identifier>
                    <orcid>0000-0002-6892-9305</orcid>
                    <external-id-orcid>0000-0001-7707-4137</external-id-orcid>
                    <external-id-common-name>ResearcherID</external-id-common-name>
                    <external-id-reference>I-6344-2013</external-id-reference>
                    <external-id-url>http://www.researcherid.com/rid/I-6344-2013</external-id-url>
                </external-identifier>
            </external-identifiers>
        </orcid-bio>
        <orcid-activities>
        ...

For a quick overall test, here is a plot of various locale values as a bar chart. The locale doesn’t really tell very much at all about the demographics of ORCID users. Rather it confirms that even among scholars, English is the predominant web browser language setting.

Code of the XSLT transformation, and of making the plot is available here.

Posted by Tuija Sonkkila

About Tuija Sonkkila

Data Curator at Aalto University. When out of office, in the (rain)forest with binoculars and a travel zoom.
This entry was posted in Coding, Data and tagged , , , . Bookmark the permalink.

Comments are closed.