By use of column-oriented storage and file mapping, great improvements in efficiency over more conventional methods can be made for some important kinds of access to large and very large tabular datasets. These techniques have been implemented in the STIL library, enabling their use in the established table analysis applications TOPCAT and STILTS. Benchmarks are presented which show certain common analysistasks running 10--40 times faster than their MySQL equivalents. Applied to datasets in the range hundreds of Mbyte to hundreds of Gbyte this speedup can be put to good use both on the desktop and at the datacenter to bring new regimes of data exploration within practical reach.
|Number of pages||4|
|Journal||Astronomical Society of the Pacific Conference Series|
|Publication status||Published - 1 Aug 2008|