/* -- STUFF -- */

Linux.com: Online library reaches million book milestone

Friday, December 21, 2007


As a freelance contributor to Linux.com:

An international venture called the Universal Library Project has made more than one million books freely available in digitized format. The joint project of researchers from China, India, Egypt, and the US has the eventual aim of digitizing all published works of man, freeing the availability of information from geographic and socioeconomic boundaries, providing a basis for technological advancement, and preserving published works against time and tide.

One and a half million books in more than 20 languages, including Chinese, English, Arabic, and various Indian languages, are now accessible via a single Web portal. The online library includes rare and out-of-print books from private and public collections around the world.

"There are plenty of books that are no longer in copyright, and that have long been forgotten, but which would be useful to scholars, students, and just the general population," says Michael Shamos, a copyright lawyer, computer science professor, and co-director of the project at the Carnegie Mellon University in the US.

"There is a tremendous amount of knowledge that we thought would be lost to mankind if we didn't start digitizing," he says.

The project believes digital books on the Internet should be free to read, instantly available, easily accessible, printable on-demand, translatable to any language, and readable to both humans and machines. Additionally, with the advent of low-cost technology like the One Laptop Per Child project's XO laptop and ebook readers, digitized books are expected to reduce the cost of learning by replacing the repetitive cost of books with a one-off computer purchase and freely downloadable information.

According to the researchers' estimates, the Universal Library collection currently represents a mere one percent of the approximately 100 million books to ever have been published. Shamos expects only half of the published books in existence to be found in physical libraries around the world, so the task of physically locating a rare book can be a tedious process.

"The only way you can obtain an out-of-print book is to find a library that has one, and either travel to that library, or obtain that book through an interlibrary loan," he says. "It's a very slow process, especially considering that without seeing the book, you might not know if there's anything interesting in it for you."

When the project was initiated in 2002, members expected other research and commercial projects to digitize only around 50,000 books. Google Book Search is one such project that was started since that time; in recent years, it has come under fire for alleged breaches of copyright. While Shamos expressed a high regard for Google's efforts and the publicity it has attracted to book digitization, he said the Universal Library Project had "similar but different" goals.

"We want to digitize all published works of man; I don't think that anybody at Google would ever say that's what their goal is," he says. "Their goal is to sell advertising, and one of the ways that they find to sell advertising is to create a Web site that has such rich content that people want to visit it all the time. I don't think that Google has any interest in putting Sanskrit works up on their Web site."

Like Google, the Universal Library Project faces issues in publishing copyrighted books online. As such, books currently under copyright are only available in part via the Web portal, while books that are not bound by copyright restrictions are fully and freely available online.

Citing a need for information to be freely available, Shamos expects these copyright restrictions to become less of an issue in time, as publishers adapt to the low-cost business model that digital books offer.

"Copyright is going to become less and less significant [because] through digitization, the cost of publishing is vanishingly small," he says. "As the cost of copying goes down, the value of works goes down, and the ability to make profit from them goes down.

"There is a difference in reading for pleasure and reading for information; what is going to happen, I think, is that copyright is going to end up focusing on works of entertainment and not works of information."
High numbers

The Universal Library Project is the brainchild of researchers at Carnegie Mellon University, and has received $3.5 million in seed funding from the National Science Foundation. The project has also received in-kind contributions from the Zhejiang University in China and the Indian Institute of Science in India that have been valued at $10 million each, and has more recently forged a partnership with the Library at Alexandria in Egypt.

With more than 1,000 workers in about 50 scanning and digitization centres around the world, the Universal Library collection is growing at an estimated 7,000 books per day. There is a fair way to go before the project reaches its lofty book digitization goals; even so, the researchers have set their sights on eventually including content like music, artwork, lectures, and newspapers in the library.

"We believe that by having a universal library with all published works of man, and having multiple sites all around the world that house the entire content, it will be impossible to destroy these works," Shamos says.

"There can never again be a destruction of the library of Alexandria. There could be a destruction of the building, but there can't be a destruction of the works, and so this makes the creation of man impervious to changes in political regime, culture, Moirai."

more