A team over at Microsoft Research have broken a record for data sorting, which uses the MinuteSort benchmark, shattering the previous record set by Yahoo in 2009.
The MinuteSort benchmark was devised by the late Jim Grey and “…measures how quickly data can be sorted starting and ending on disks” in 60 seconds. The Microsoft Research team topped Yahoo’s record using a new sorting method called Flat Datacenter Storage. The team used this new architecture to crunch through 1,401GB of data using 1,033 spread over 250 machines, which far outstrips Yahoo’s 500GB of data done using 5,624 disks across 1,406 machines.
This is significant because the record indicates that there are new ways to crunch data quickly and using relatively inexpensive setups, which is increasingly a requirement for this day and age.
“In an age when information is increasing in enormous quantities, the ability to move and deploy it is important for everything from web searches to business analytics to understanding climate change.”
“The ability to sort data rapidly also will aid machine learning—the design and development of algorithms that enable computers to create predictions based on data, such as sensor data or information from databases. Microsoft Research has a big stake in machine learning, in work ranging from language processing to security applications.”