All posts by Ethan Gruber

First Digital Object Identifiers Minted for ANS Digital Library Items


Several weeks ago, the ANS migrated an older, circa 2002 TEI ebook on the Taranto 1911 hoard, authored by John Kroll and Sebastian Heath, into our Digital Library. The original TEI file and subsequent updates have been loaded into our TEI Github repository. The updates follow transcription precedents that we have set in older ANS-published printed monographs as part of the Mellon-funded Open Humanities Book Program: relevant places, objects, people, etc. have been linked to entities in LOD systems, such as All of the objects within this hoard (itself linked to IGCH 1864) are in the British Museum and linked to their URIs. Upon publication into the ANS Digital Library, the document parts are now accessible from the IGCH 1864 record and in (eventually) in Pelagios, connected to relevant ancient places.

Since Sebastian is an active scholar, with an ORCID, this document served as a proof of concept for the next iteration of ANS digital publication: that our current and future monographs and journal articles, once issued openly online, should be connected to ORCIDs for their authors, and publication metadata should be submitted to Crossref to mint a DOI and enhance accessibility. Furthermore, since there’s a direct connection between ORCID and Crossref submissions, this new digital publication workflow would automatically populate an author’s scholarly profile with ANS publications. This is a vast improvement over the likes of, which requires manual submission. The broad vision is this:

Regardless of whether an author submits works through the American Numismatic Society Digital Library,, Humanities Commons, their own institutional repository, or an Open Access journal system, their ORCID profile is the central, canonical aggregation of the entirety of their intellectual output (which includes datasets, software, etc.).

This aggregation system between DOIs and ORCIDs, following Linked Open Data principles, is the future of academic publication. Ideally, it should be expanded beyond citations to modern works with DOIs and ORCIDs to include more historic works defined by Worldcat and linked to historic scholars with ISNI identifiers. It would take a tremendous amount of work, but in theory, it would be possible to create a network graph of citations across all disciplines, going back in history to the advent of the printed book, charting the evolution of how knowledge is generated and disseminated. Therefore, Crossref, ISNI, and ORCID would perhaps play a greater role than providing simple (and superficial) citation metrics in enabling us to develop a broader historiography and analysis of scholarship itself. We plan to mint DOIs for our historical publications eventually, if Crossref extends its XML schema to support ISNI identifiers.

Under the Hood

Some extensions were implemented in ETDPub, the TEI/MODS publication framework that underlies the ANS Digital library. First, I authored XSLT stylesheets that would crosswalk TEI or MODS into the appropriate Crossref XML model according to their schema version 4.4.0. You can see an example of my MA thesis here:


If the author/editor URI matches an ORCID URI in the TEI, then the Admin panel in ETDPub will enable the publication of the metadata to Crossref. Similarly, within the MODS ETD editing interface (in XForms), a user can insert a mods:nameIdentifier[@type=’orcid’] under the mods:name for an author/editor in order to capture the ORCID. So far, only TEI or MODS records with ORCIDs attached to people are available for submission into Crossref to mint a DOI.

Submission Workflow

In the admin panel, if a document is eligible for submission to Crossref, a checkbox is available. Clicking on this will fire off a series of actions in the XForms engine:

  1. The TEI/MODS-to-Crossref XML transformation is executed and loaded into an XForms instance
  2. The Crossref XML is serialized to /tmp because it must be attached via multipart/form-data
  3. Still having difficulty getting multipart/form-data to execute correctly in the XForms engine, the XForms engine instead interacts with a PHP script in CGI
  4. After the PHP script responds with a successful HTTP code, the MODS/TEI document is loaded in the XForms engine in order to insert the DOI in the proper location within the document
  5. The TEI/MODS file is saved back to eXist, and the standard publication workflow is executed (a chain of XForms submissions), updating the Solr search index and the triplestore/SPARQL endpoint

So far two documents in the Digital Library have DOIs connected to ORCIDs:

Taranto 1911:
My thesis (Recent Advancements in Roman Numismatics):

NOTE: This post was originally published by Ethan Gruber in his blog, XForms for Archives. It is re-posted here with permission by the author.

Open Access,, and why I’m all-in on

Note: The majority of this post and the migration framework (academia-migrate) were authored about a year ago, but placed on the back burner while other projects demanded my time. Between the revelation that has implemented banner ads on some profiles and Sarah Bond’s article in Forbes, I have been motivated to finally push this project into production.

Many scholars throughout the world use to broaden access to their own research, which includes not only published documents, but unpublished manuscripts or presentation materials (such as Powerpoint slideshows) that would otherwise never be submitted to peer-reviewed journals. bills itself as a “disruptive” service that takes a shot at the increased commercialization (and resulting access restrictions) of academic publication. For scholars that want their research to be made available to the widest possible audience, peer-reviewed journals are falling short. Peer review offers a certain cachet required by university administrators for considerations toward tenured professorship, but more and more journals are owned and distributed by fewer and fewer publishers. University libraries are strapped with increasing costs to subscribe to journals, and unaffiliated scholars are on the outside looking in with regard to access to current scholarship, unless they would like to pay as much as $50 to acquire a single article. has changed this somewhat. With HTML microdata and pathways for search robots to crawl full-text articles, researchers are able to find relevant articles through Google, and Google’s algorithms tend to favor over other harder-to-crawl sources.

On the surface, this seems great for scholars. And it was good in the beginning, but this has changed over the last year. Despite its domain name, is a commercial venture. It is beholden to investors, not the scholarly community it serves, nor universities, governments, or taxpayers. Recently, an developer approached a scholar about his willingness to participate in a pay-to-play system. I won’t go into great detail, as the initial exchange and subsequent outrage on Twitter have already been covered thoroughly. But what does paying for a recommendation mean? Aside from sacrificing a certain intellectual honestly, a recommendation essentially enhances visibility and access to your work. By definition, though, not paying for a recommendation thus reduces visibility and access to your work. If the developers alter the metadata provided to robots to improve search relevance for those that pay for their publications to be promoted, this necessarily reduces relevance for non-paying users. As a result, access declines, which reduces the likelihood of citation, and may even negatively impact administrative reviews of faculty output.

Furthermore, it appears that is now experimenting with banner advertisements. They do not yet appear to be a permanent fixture, but I believe we are seeing the beginning of overt attempts at generating income on top of research that scholars have published to the site in good faith that it is free and open.

So is there a solution?

Yes. It is is a truly open access scholarly publication framework that is capable of replacing Zenodo is open to “research outputs from across all fields of science,” including the humanities and social sciences. Like Academia, users may upload journal articles, conference papers, posters, and presentations, but may also upload raw research data. Zenodo is developed by CERN, which has long demonstrated its devotion to open science and the web. It is backed by funding from the European Union. Moreover, Zenodo has a well-documented API for publishing and harvesting content via well-known open web standards. This is in stark contrast to, which goes to great lengths to prevent users from harvesting publication metadata and makes it impossible to download documents without registering for an account (which also inflates their userbase). prides itself in being disruptive, but it too needs to be disrupted.

Migrating from to

I fully advocate leaving, but what purpose does it serve to simply delete your account? You are removing publications that are, in the very least, freely and openly available at the moment. Essentially, the best decision is to migrate documents to, and allow at least one week for Google to fully index migrated content before deleting the account. My MA thesis entitled “Recent Advances in Roman Numismatics,” about the application of Linked Open Data methodologies toward Roman numismatics with and Online Coins of the Roman Empire, had been available in both the ANS Digital Library and as of January 28, 2016. Due to our superior use of microdata and full-text indexing, the ANS Digital Library version surpassed Academia days after it was published. I uploaded my thesis to January 29, 2016, and it was already on the first page of Google three days later.

Many of us have uploaded a substantial number of documents to, and it might be tedious to re-upload these documents into a new system, especially with regard to re-entering publication metadata. I have sought to rectify this by facilitating a more efficient migration system. I have developed a framework that is capable of parsing metadata from an profile (although not all publications are listed when the profile page loads), accepting re-uploaded documents (since these cannot be extracted from directly), and uploading these contents into This framework itself is open source and available on Github. I will save the technical discussion for different venue.

Screen capture of one of Terhi Nurmikko-Fuller's papers after parsed from
Screen capture of one of Terhi Nurmikko-Fuller’s papers after parsed from

This system took about a week to develop, but I hope that this migration process might save each user several minutes per publication. I hope that this work will encourage more scholars to consider migrating to from Academia. Migration ultimately enhances the value of these publications, as they can be harvested en masse by members of the general public, who might be able to use them for statistical analyses, to enhance them with named entity recognition or improved interlinking between publications (via Library of Congress Subject Headings, which are incorporated into Zenodo’s metadata entry system), or to simply read them without the obstacle of registering for an account. It is time to accept that the is seeking to shift the academic publishing paradigm from one commercial provider to another.