AlphaTwirl: a python library for summarizing event data into multi-dimensional categorical data

Tai Sakuma

Research output: Contribution to conferenceConference Poster

Abstract

AlphaTwirl is a python library that loops over event data and summarizes them into multi-dimensional categorical (binned) data as data frames. Event data, input to AlphaTwirl, are data with one entry (or row) for one event: for example, data in ROOT TTree with one entry per collision event of an LHC experiment. Event data are often large -- too large to be loaded in memory -- because they have as many entries as events. Multi-dimensional categorical data, the output of AlphaTwirl, have one row for one category. They are usually small -- small enough to be loaded in memory -- because they only have as many rows as categories. Users can, for example, import them as data frames into R and pandas, which usually load all data in memory, and can perform categorical data analyses with a rich set of data operations available in R and pandas. In this presentation, I will show (a) an example workflow of data analysis using AlphaTwirl and data frames, (b) the user interface of AlphaTwirl, e.g., how to specify conditions of event selection, binning and categories, and methods to summarize data in each category, and (c) features of implementation, such as concurrency in looping over large event data. In addition, I will mention particular analyses in CMS using AlphaTwirl. I will also discuss possibilities for future development.
Original languageEnglish
Publication statusIn preparation - 10 Jul 2018
Event23rd International Conference on Computing in High Energy and Nuclear Physics - National Palace of Culture, Sofia, Bulgaria
Duration: 9 Jul 201813 Jul 2018
http://chep2018.org/

Conference

Conference23rd International Conference on Computing in High Energy and Nuclear Physics
Abbreviated titleCHEP 2018
Country/TerritoryBulgaria
CitySofia
Period9/07/1813/07/18
Internet address

Fingerprint

Dive into the research topics of 'AlphaTwirl: a python library for summarizing event data into multi-dimensional categorical data'. Together they form a unique fingerprint.

Cite this