Anyone who has dared to download, a large file using the Bit Torrent system in which chunks of the file are pulled from other BT users in a form of distributed file sharing will know how slow (and sometimes how fast) the method can be. Although much of the BT system is exploited to share pirated movies and music it has a serious, legitimate side that also allows scientists, engineers and programmers to share the burden of huge database and ISO image downloads. Now, thousands of US tax dollars (in the form of an NSF CAREER grant) have been spent on improving on the Bit Torrent system.
David Andersen and colleagues at Carnegie Mellon University spotted the fatal flaw in torrents that often leads to the file sharing system grinding to a halt if the number of users with the complete or almost complete file are offline.
In conventional BT downloads, the files being shared must match exactly across the distributed sharing network or else they are ignored for download purposes. Anderson realized that identifying relevant chunks of files that may not be identical but are similar to a desired file could speed up Bit Torrent downloads. Anderson and his colleagues have designed Similarity-Enhanced Transfer (SET) to exploit this concept.
Anderson claims SET could make some transfers five times faster. “This is a technique that I would like people to steal,” Andersen said. Though he and his colleagues hope to implement SET in a service for sharing software or academic papers, they have no intention of applying it themselves to movie- or music-sharing services. “But it would make P2P transfers faster and more efficient,” he added, “and developers should just take the idea and use it in their own systems.”
SET works in a similar way to BitTorrent. Once a download is started, the source file is broken down into unique chunks. These chunks are downloaded simultaneously from accessible sources on the sharing network and then reassembled on the user’s computer. While this is underway, the SET program continues to search for similar files using a process called handprinting. In this method, sampling of non-identical files is used to find chunks that match the required chunks. Relevant chunks can then be downloaded from the similar files identified by this method, making the overall process much faster.
Although the researchers hope to use the SET approach for legitimate academic file sharing, they tested it on more common music and movie downloads. They saw a more than 70% improvement in downloading an mp3 file. A larger 55 Mb movie trailer was 30% faster when it could pull chunks from movie trailers that were 47 percent similar.
The researchers hope that such efficiency improvements will make SET part of the next generation of high-speed online multimedia delivery. “We believe that handprinting strikes an attractive balance for multi-source transfers. It efficiently locates the sources of exploitable similarity that have the most chunks to contribute to a receiver, and it does so using only a small, constant number of lookups. For these reasons, we believe that this technique is an attractive one to use in any multi-source file transfer system,” say the researchers.