Jay Emerson - Yale University

Starts: 4:00 pm on Thursday, September 9, 2010
Ends: 5:00 pm on Thursday, September 9, 2010
Location: MCS 149

TITLE: The Bigmemory Project ABSTRACT: Multi-gigabyte data sets challenge and frustrate R users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of R’s rich statistical programming environment. The package bigmemory and sister packages biganalytics, synchronicity, bigtabulate, and bigalgebra bridge this gap, implementing massive matrices and supporting their manipulation and exploration. The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set. The data structures may also be filebacked, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster. These features of the Bigmemory Project open the door for powerful and memory-efficient parallel analyses and data mining of massive data sets.