As reported by the National Network of Libraries of Medicine (see original post):
NIH-supported scientists have made over 300,000 author manuscripts available in PMC. Now NIH is making these papers accessible to the public in a format that will allow robust text analyses.
You can download the PMC collection of NIH-supported author manuscripts as a package in either XML or plain-text format at ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/manuscript/. The collection encompasses all NIH manuscripts posted to PMC that were published in July 2008 or later. While the public can access the manuscripts’ full text and accompanying figures, tables, and multimedia via the PMC website, the newly available XML and plain-text files include full text only. In addition to text mining, the files may be used consistent with the principles of fair use under copyright law.
Please note that these author manuscript files are not part of the PMC Open Access Subset.
The NIH Office of Extramural Research developed this resource to increase the impact of NIH funding. Through this collection, scientists will be able to analyze these manuscripts, further apply NIH research findings, and generate new discoveries.
For more information and instructions, please visit the PMC author manuscript collection webpage.