(과학기술) 오래된 수 많은 이미지들 검색 가능한 데이타로 전환과학과 테크놀로지/테크놀로지 2014. 9. 2. 16:26
출처: http://www.bbc.com/news/technology-28976849
29 August 2014 Last updated at 08:18
Millions of historical images posted to Flickr 오래된 수 많은 이미지들 검색 가능한 데이타로 전환
The project has resulted in even more pictures of cats being put on to the internet 이 프로젝트는 인터넷에 더 많은 고양이들의 사진들을 올리기로 했다
An American academic is creating a searchable database of 12 million historical copyright-free images. 미국의 한 학자가 검색 가능한 데이타베이스를 구축하고 있다. 여기에는 1,200만 장의 역사적인 저작권 무료인 이미지들이 담겨있다.
Kalev Leetaru has already uploaded 2.6 million pictures to Flickr, which are searchable thanks to tags that have been automatically added. 칼레브 리타루는 이미 260만 장의 사진들을 Flickr에 올려놓았다. 이 곳은 자동적으로 추가된 태그를 통해 검색될 수 있다.
The photos and drawings are sourced from more than 600 million library book pages scanned in by the Internet Archive organisation. 이 사진들과 그림들은 인터넷 고문서 단체가 스캔한 도서관 도서 6억 페이지보다 많은 자료에서 찾은 것들이다.
The images have been difficult to access until now. 이 이미지들은 지금까지 접근이 어려웠던 것들이다.
Mr Leetaru said digitisation projects had so far focused on words and ignored pictures. 리타루 氏는 디지털 작업 프로젝트는 지금껏 이미지보단 글에 중점을 두어왔었다고 말했다.
"For all these years all the libraries have been digitising their books, but they have been putting them up as PDFs or text searchable works," he told the BBC. "최근 몇 년간 모든 도서관들은 자신들의 도서를 디지털 데이타하는 작업을 해오고 있으나, 그들은 PDF나 문서 검색 가능한 것으로 작업해오고 있어요."
"They have been focusing on the books as a collection of words. This inverts that. "그들은 말의 집합체인 도서들에 집중하고 있었던 거지요."
"Stretching half a millennium, it's amazing to see the total range of images and how the portrayals of things have changed over time.
Visitors to the site are free to copy and make use of the pictures without charge
"Most of the images that are in the books are not in any of the art galleries of the world - the original copies have long ago been lost."
The pictures range from 1500 to 1922, when copyright restrictions kick in.
Piggyback program
Mr Leetaru began work on the project while researching communications technology at Georgetown University in Washington DC as part of a fellowship sponsored by Yahoo, the owner of photo-sharing service Flickr.
To achieve his goal, Mr Leetaru wrote his own software to work around the way the books had originally been digitised.
The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.
This drawing, dating back to 1502, is one of the oldest in the collection
As part of the process, the software recognised which parts of a page were pictures in order to discard them.
Mr Leetaru's code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.
The software also copied the caption for each image and the text from the paragraphs immediately preceding and following it in the book.
Each Jpeg and its associated text was then posted to a new Flickr page, allowing the public to hunt through the vast catalogue using the site's search tool.
"I think one of the greatest things people will do is time travel through the images," Mr Leetaru said.
"Type in the telephone, for example, and you can see that all the initial pictures are of businesspeople, and mostly men.
The library of pictures allows users to explore how technologies developed over the years
"Then you see it morph into more of a tool to connect families.
"You see another progression with the railroad where in the first images it was all about innovation and progress that was going to change the world, then you see its evolution as it becomes part of everyday life."
'Hit and miss'
Archivists said they were impressed with the project.
"Finding images within texts and tagging large collections of images are notoriously difficult," said Dr Alison Pearn, a senior archivist from the University of Cambridge and associate director of the Darwin Correspondence Project.
"This is a clever way of providing both quantity and searchability, and it's great that it is freely available for anyone to use.
"The image identification has picked up things like library stamps and scribbles in the margins, and the tagging is a bit hit and miss, but research has always been at least in part about serendipity, and who knows what people will find to do with them."
The images should prove useful to amateur and professional historians
Mr Leetaru's own ambition is a tie-up with the internet's most famous encyclopaedia once his project is completed next year.
"What I want to see is... Wikipedia have a national day of going through this to illustrate Wikipedia articles," he said.
"Take a random page about a historical event and there's probably a good chance that you're going to find an image in here that bears in some way on that event or location.
"Being able to basically enrich [them] would be huge."
The many illustrations available include this sketch of Edinburgh shops published in 1846
He added that he also planned to offer his code to others.
"Any library could repeat this process," he explained.
"That's actually my hope, that libraries around the world run this same process of their digitised books to constantly expand this universe of images."
'과학과 테크놀로지 > 테크놀로지' 카테고리의 다른 글
(과학기술) 셰필드 대학교가 공개한 3D 망원경으로 찍은 첫 사진 (0) 2014.09.22 (과학기술) 디폴트(고정값)로 암호화 도입하는 구글과 애플 (0) 2014.09.22 (과학기술) 애플보다 먼저 새로운 스마트 시계 선보이는 삼성과 LG (0) 2014.08.28 (과학기술) 모질라의 최초 저가 스마트폰 인도에서 판매 (0) 2014.08.26 (과학기술) 연구자들이 만들어 낸 환상 속 '시간여행' (0) 2014.08.26