Harvard Releases Free Large Scale Ai Training Database
According to Wired, the dataset includes publications scanned by Google Books that are not protected by copyright anymore—it usually expires 70 years after the author’s death or its publication. The data collection covers multiple formats and genres, from creative writing by famous authors like Charles Dickens, Shakespeare, and Dante to textbooks and dictionaries. Harvard in collaboration with Google Books released a dataset with almost 1 million public-domain books to train AI models for free The dataset was created by the new Institutional Data Initiative, an initiative backed by Microsoft and OpenAI Small organizations can benefit from this data collection to compete more fairly in the AI sphere...