There are various impressive data services and initiatives, which can be found from the Purdue University Libraries. In this entry I will discuss the PURR (Purdue University Research Repository) and Data Curation Profiles. I feel that here are several topics that neither us, Helsinki University Library or Oulu University Library has yet considered. All of the information below is available through Purdue University Library’s website. I will however present a “Finnish digest” of it as follows.
Purdue University Research Repository (purr.purdue.edu) is a very cool way to manage, publish and archive data in research projects. When it was designed, it was not meant to compete with already existing solutions of data management and publishing that some fields already have. Moreover, it was designed to provide core data services, such as giving published datasets DataCite DOIs, providing collaboration spaces for data sharing and creating solutions for safe archiving of the data, for various different scientific fields ranging from e.g. agricultural sciences to humanities. In short, it is a great example of what a general data management tool provided by a library could be. Also “PURR comes with a set of default policies and functionality that address privacy and confidentiality, intellectual property and copyright, and access and sharing of data” (source: https://purr.purdue.edu/about/usehub). One of the key personnel behind PURR is the brilliant and friendly Michael Witt (http://oldsite.lib.purdue.edu/research/witt/).
There are four main ways in which the PURR can be used:
- It helps researchers to create data management plans (abbr. DMP) for funders to review (https://purr.purdue.edu/dmp and https://purr.purdue.edu/resources/14/download/DMP_Self_Assessment.pdf). Very cleverly it also provides researchers with description of the PURR as a tool for their data management to be used in their DMPs (https://purr.purdue.edu/about/usehub).
- It allows researchers to upload their data and share it with collaborators. Researchers can also invite collaborators from other institutions. Once a project is initiated, the project site also includes cool features, such as (https://purr.purdue.edu/projects/features): updates and microblogging, to-do lists, project notes, file management, publishing.
- It allows researchers to publish their datasets (with DataCite DOI, which I think is very cool and advanced). The data can be published in various formats (like e.g. in excel-sheets and zip. packages). The published datasets are equipped with: the abstract considering the research, supporting documents, versions, reviews and questions. It also very wisely gives instructions on how to cite the published data set. Here’s a sample dataset (https://purr.purdue.edu/publications/1004). And what is also cool, is that the published dataset are indexed to Purdue’s Primo catalog (same as our Alli): http://purdue-primo-prod.hosted.exlibrisgroup.com/primo_library/libweb/action/search.do?dscnt=1&dstmp=1369919852307&vid=PURDUE&ct=AdvancedSearch&mode=Advanced&fromLogin=true
- It allows research to archive their data after the completion of the research project. PURR is working towards the ISO 16363 process to become a certified trusted digital repository.
The mere notion of DataCite DOIs (http://www.datacite.org) is, in my opinion, already on itself something that we have overlooked in the Finnish academic libraries. For example, publishing a dataset may be seen as a way of increasing one’s research impact in forms of e.g. receiving more citations. A dataset with a DOI is certainly more convenient to cite than a dataset without one. If you look at the DataCite members, you will notice that there are both institutions like our CSC (e.g. Swedish National Data Service) but also individual libraries, such as EHT Zurich. And to be clear, Finnish CSC is not listed as a member institution.
If we wander into technical territory, PURR is based on HUBzero open source software platform, which was developed here in Purdue. I cannot really call myself a computer wiz, but if I got it right, HUBzero requires PHP proficient guys or girls (http://hubzero.org/documenta tion/1.1.0/webdevs/index), which I know we have got in our library. It is also a close relative of Joomla! If I remember correctly, it was actually developed from or from parts of Joomla!
As to the similar services in Aalto’s context, the Linked Open Aalto Data Service addresses mostly data about Aalto courses, publications and people, not research data as such. I know that CSC (Center of Scientific Computing) in Finland is preparing data management products. However, if you compare CSC’s product descriptions (http://www.csc.fi/english/research/datastorage) and PURR’s short description above, you will find that the PURR offers much more flexible and multifaceted tools for managing data in research projects. Actually, I do not think that CSC currently provides any services similar to PURR individual traits.
And what is more, the Purdue folks market PURR with a very cool video (http://www.youtube.com/watch?v=Yw0IJj7FqA8&feature=em-share_video_use). How do you top that?
Data Curation Profiles
Okay, what exactly are we talking about when referring to data curation? Purdue’s Data Curation Profiles User Guide uses the Graduate School of Library and Information Sciences at the University of Illinois definition: “the active and ongoing management of data through its lifecycle of interest and usefulness to scholarship, science, and education. Data curation enables data discovery and retrieval, maintains data quality, adds value, and provides for re-use over time, through activities including authentication, archiving, management, preservation, retrieval, and representation.” (http://www.lis.illinois.edu/academics/programs/specializations/data_curation) The Data Curation Profiles then are not so much of a library service as such, but a toolkit for providing empirical evidence on which next generation library services can be built and based on. The unit of analysis is a single dataset that e.g. the researcher finds has the most scholarly value or is most representative of his or her work. This dataset is then approached through the researcher’s viewpoint by examining e.g.: description(s) of the dataset(s), the lifecycle of the dataset, sharing (and willingness to share) of data, providing (and willingness to provide) access to the data e.g. through an open repository, organization and description of data, how could the data be discovered by different audiences (if discovery is desired), intellectual property issues concerning the data, tools, data management and data preservation.
Procedure behind conducting Data Curation Profiles is well documented and published under creative commons license on the website datacurationprofiles.org. A key personnel behind the Data Curation Profiles is the alike brilliant and friendly Jake Carlson (http://blogs.lib.purdue.edu/jcarlson/). In some instances he is referred to as Data Research Scientist, which, in my opinion, tells a lot from the nature of Data Curation Profiles. He has written many excellent articles on the matter, such as: Carlson, J. (2012) Demystifying the Data Interview: Developing a Foundation for Reference Librarians to Talk with Researchers about their Data [article] Reference Services Review 40(1). 7-23.
The basic procedure, to my current knowledge, has three main guiding templates:
- Data Curation Profiles User Guide
- The Interview Work Sheet for conducting faculty/gradute students/? interviews with using the Interview Worksheet (also interviewers manual is available)
- Data Curation Profile Template to be used in conjunction with grounded theory methodology to generate the final profiles.
- All of these documents are available through: http://datacurationprofiles.org/
So to conduct Data Curation Profiles is actually to conduct qualitative research on about the different kinds of datasets produced at your university. The structure of the Interview Worksheet is modular and different modules can be altered according to field specific needs. For example the interview worksheet used in researchers working with GIS data differed significantly from the standard interview worksheet. The target group of Data Curation Profiles is not only the primary investigators, but Purdue Libraries have worked with also e.g. graduate students in creating these profiles. If I got it right, there’s a notion that also the interviewees have gained from the interview, as it has helped them to formulate their vague needs concerning data curation and also consider the needs beyond data’s immediate use (more info on the Data Curation Profile User Guide: http://datacurationprofiles.org/download).
Data Curation Profiles are not to be confused with Data Management Plans, on which Purdue University Libraries also give consultancy on (see e.g. the Data Management Plan Self-assessment tool also by Jake Carlson: https://purr.purdue.edu/resources/14/download/DMP_Self_Assessment.pdf and also the Data Management Plan overview template: https://purr.purdue.edu/dmp/dmpoverview). Data Management Plans are required by many significant funders and the Purdue University Libraries have certainly tackled this need in their context. Data Management Plans or DMPs were also shortly discussed in the previous PURR section.
If I understood right, it is hard to define which Purdue Libraries services can be seen to have directly derived from Data Curation Profiles, but I understood that they have informed many, such as PURR and Data Management Plan consultancy. One of the spin-offs is also the Data Information Literacy project, which examines what these needs originated through the “data deluge” could mean at level of library instruction (http://wiki.lib.purdue.edu/display/ste/Home;jsessionid=03E2BCF41EAE2A42619ECC90EA6547DA). Jake Carlson is the mastermind behind this one as well.
The Data Curation Profiles seem like an excellent example of library pioneers venturing to get a foothold on new emerging fields by providing empirical evidence on the emerging needs of their patrons. The Purdue Libraries have then used this evidence to create feasible next generation library services. Many completed profiles can be found from the http://datacurationprofiles.org/ and used for the benefit of the entire library profession. Overall, I think there is actually an abundance of great data service initiatives here in Purdue. Are these services something that we can still continue overlooking in Finland? Decide for yourself.
Oh, and there’s yet one more data service from Purdue Libraries: Databib (http://databib.org/about.php), which is a great tool for locating online repositories of research data. 🙂