This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.

Table of contents

Volume 8

Number 1, January 2015

Previous issue

014010

PythonTeX is a LaTeX package that allows Python code in LaTeX documents to be executed and provides access to the output. This makes possible reproducible documents that combine results with the code required to generate them. Calculations and figures may be next to the code that created them. Since code is adjacent to its output in the document, editing may be more efficient. Since code output may be accessed programmatically in the document, copy-and-paste errors are avoided and output is always guaranteed to be in sync with the code that generated it. This paper provides an introduction to PythonTeX and an overview of major features, including performance optimizations, debugging tools, and dependency tracking. Several complete examples are presented. Finally, advanced features are summarized. Though PythonTeX was designed for Python, it may be extended to support additional languages; support for the Ruby and Julia languages is already included. PythonTeX contains a utility for converting documents into plain LaTeX, suitable for format conversion, sharing, and journal submission.

014009

, , , , , , , , , et al

This paper presents SunPy (version 0.5), a community-developed Python package for solar physics. Python, a free, cross-platform, general-purpose, high-level programming language, has seen widespread adoption among the scientific community, resulting in the availability of a large number of software packages, from numerical computation (NumPy, SciPy) and machine learning (scikit-learn) to visualization and plotting (matplotlib). SunPy is a data-analysis environment specializing in providing the software necessary to analyse solar and heliospheric data in Python. SunPy is open-source software (BSD licence) and has an open and transparent development workflow that anyone can contribute to. SunPy provides access to solar data through integration with the Virtual Solar Observatory (VSO), the Heliophysics Event Knowledgebase (HEK), and the HELiophysics Integrated Observatory (HELIO) webservices. It currently supports image data from major solar missions (e.g., SDO, SOHO, STEREO, and IRIS), time-series data from missions such as GOES, SDO/EVE, and PROBA2/LYRA, and radio spectra from e-Callisto and STEREO/SWAVES. We describe SunPyʼs functionality, provide examples of solar data analysis in SunPy, and show how Python-based solar data-analysis can leverage the many existing tools already available in Python. We discuss the future goals of the project and encourage interested users to become involved in the planning and development of SunPy.

014008

, , , and

Sequential model-based optimization (also known as Bayesian optimization) is one of the most efficient methods (per function evaluation) of function minimization. This efficiency makes it appropriate for optimizing the hyperparameters of machine learning algorithms that are slow to train. The Hyperopt library provides algorithms and parallelization infrastructure for performing hyperparameter optimization (model selection) in Python. This paper presents an introductory tutorial on the usage of the Hyperopt library, including the description of search spaces, minimization (in serial and parallel), and the analysis of the results collected in the course of minimization. This paper also gives an overview of Hyperopt-Sklearn, a software project that provides automatic algorithm configuration of the Scikit-learn machine learning library. Following Auto-Weka, we take the view that the choice of classifier and even the choice of preprocessing module can be taken together to represent a single large hyperparameter optimization problem. We use Hyperopt to define a search space that encompasses many standard components (e.g. SVM, RF, KNN, PCA, TFIDF) and common patterns of composing them together. We demonstrate, using search algorithms in Hyperopt and standard benchmarking data sets (MNIST, 20-newsgroups, convex shapes), that searching this space is practical and effective. In particular, we improve on best-known scores for the model space for both MNIST and convex shapes. The paper closes with some discussion of ongoing and future work.

014007

, and

Machine learning benchmark data sets come in all shapes and sizes, whereas classification algorithms assume sanitized input, such as (x, y) pairs with vector-valued input x and integer class label y. Researchers and practitioners know all too well how tedious it can be to get from the URL of a new data set to a NumPy ndarray suitable for e.g. pandas or sklearn. The SkData library handles that work for a growing number of benchmark data sets (small and large) so that one-off in-house scripts for downloading and parsing data sets can be replaced with library code that is reliable, community-tested, and documented. The SkData library also introduces an open-ended formalization of training and testing protocols that facilitates direct comparison with published research. This paper describes the usage and architecture of the SkData library.

014006

and

Graphics processing units (GPUs) have become increasingly powerful in recent years. Programs exploring the advantages of this architecture could achieve large performance gains and this is the aim of new initiatives in high performance computing. The objective of this work is to develop an efficient tool to model 2D elastic wave propagation on parallel computing devices. To this end, we implement the elastodynamic finite integration technique, using the industry open standard open computing language (OpenCL) for cross-platform, parallel programming of modern processors, and an open-source toolkit called [Py]OpenCL. The code written with [Py]OpenCL can run on a wide variety of platforms; it can be used on AMD or NVIDIA GPUs as well as classical multicore CPUs, adapting to the underlying architecture. Our main contribution is its implementation with local and global memory and the performance analysis using five different computing devices (including Kepler, one of the fastest and most efficient high performance computing technologies) with various operating systems.

015003

, , , , , and

In many communities such as climate science or industrial design, to solve complex coupled problems with high fidelity external coupling of legacy solvers puts a lot of pressure on the tool used for the coupling. The precision of such predictions not only largely depends on simulation resolutions and the use of huge meshes but also on high performance computing to reduce restitution times. In this context, the current work aims at studying the scalability of code coupling on high performance computing architectures for a conjugate heat transfer problem. The flow solver is a Large Eddy Simulation code that has been already ported on massively parallel architectures. The conduction solver is based on the same data structure and thus shares the flow solver scalability properties. Accurately coupling solvers on massively parallel architectures while maintaining their scalability is challenging. It requires exchanging and treating information based on two different computational grids that are partitioned differently on a different number of cores. Such transfers have to be thought to maintain code scalabilities while maintaining numerical accuracy. This raises communication and high performance computing issues: transferring data from a distributed interface to another distributed interface in a parallel way and on a very large number of processors is not straightforward and solutions are not clear. Performance tests have been carried out up to 12 288 cores on the CURIE supercomputer (TGCC/CEA). Results show a good behavior of the coupled model when increasing the number of cores thanks to the fully distributed exchange process implemented in the coupler. Advanced analyses are carried out to draw new paths for future developments for coupled simulations: i.e. optimization of the data transfer protocols through asynchronous communications or coupling-aware preprocessing of the coupled models (mesh partitioning phase).

014005

and

DMTCP (Distributed MultiThreaded CheckPointing) is a mature checkpoint–restart package. It operates in user space without kernel privilege, and adapts to application-specific requirements through plugins. While DMTCP has been able to checkpoint Python and IPython 'from the outside' for many years, a Python module has recently been created to support DMTCP. IPython support is included through a new DMTCP plugin. A checkpoint can be requested interactively within a Python session or under the control of a specific Python program. Further, the Python program can execute specific Python code prior to checkpoint, upon resuming (within the original process) and upon restarting (from a checkpoint image). Applications of DMTCP are demonstrated for: (i) Python-based graphics using virtual network client, (ii) a fast/slow technique to use multiple hosts or cores to check one (Cython Behnel S et al 2011 Comput. Sci. Eng.13 31–39) computation in parallel, and (iii) a reversible debugger, FReD, with a novel reverse-expression watchpoint feature for locating the cause of a bug.

014004

, , and

The Hydrodynamic and oil spill modeling system for Python (HyosPy) is presented as an example of a multi-model wrapper that ties together existing models, web access to forecast data and visualization techniques as part of an adaptable operational forecast system. The system is designed to automatically run a continual sequence of hindcast/forecast hydrodynamic models so that multiple predictions of the time-and-space-varying velocity fields are already available when a spill is reported. Once the user provides the estimated spill parameters, the system runs multiple oil spill prediction models using the output from the hydrodynamic models. As new wind and tide data become available, they are downloaded from the web, used as forcing conditions for a new instance of the hydrodynamic model and then applied to a new instance of the oil spill model. The predicted spill trajectories from multiple oil spill models are visualized through Python methods invoking Google MapTM and Google EarthTM functions. HyosPy is designed in modules that allow easy future adaptation to new models, new data sources or new visualization tools.

014003

, , , , , and

The Python libraries NumPy and SciPy are extremely powerful tools for numerical processing and analysis well suited to a large variety of applications. We developed ObsPy (http://obspy.org), a Python library for seismology intended to facilitate the development of seismological software packages and workflows, to utilize these abilities and provide a bridge for seismology into the larger scientific Python ecosystem. Scientists in many domains who wish to convert their existing tools and applications to take advantage of a platform like the one Python provides are confronted with several hurdles such as special file formats, unknown terminology, and no suitable replacement for a non-trivial piece of software. We present an approach to implement a domain-specific time series library on top of the scientific NumPy stack. In so doing, we show a realization of an abstract internal representation of time series data permitting I/O support for a diverse collection of file formats. Then we detail the integration and repurposing of well established legacy codes, enabling them to be used in modern workflows composed in Python. Finally we present a case study on how to integrate research code into ObsPy, opening it to the broader community. While the implementations presented in this work are specific to seismology, many of the described concepts and abstractions are directly applicable to other sciences, especially to those with an emphasis on time series analysis.

014002

, and

We have previously reported an L2-gradient flow (L2GF) method for cryo-electron tomography and single-particle reconstruction, which has a reasonably good performance. The aim of this paper is to further upgrade both the computational efficiency and accuracy of the L2GF method. In a finite-dimensional space spanned by the radial basis functions, a minimization problem combining a fourth-order geometric flow with an energy decreasing constraint is solved by a bi-gradient method. The bi-gradient method involves a free parameter $\beta \in [0,1].$ As β increases from 0 to 1, the structures of the reconstructed function from coarse to fine are captured. The experimental results show that the proposed method yields more desirable results.

015002

and

We have developed a 2.5D MHD code designed to study how the solar wind influences the evolution of transient events in the solar corona and inner heliosphere. The code includes thermal conduction, coronal heating and radiative cooling. Thermal conduction is assumed to be magnetic field-aligned in the inner corona and transitions to a collisionless formulation in the outer corona. We have developed a stable method to handle field-aligned conduction around magnetic null points. The inner boundary is placed in the upper transition region, and the mass flux across the boundary is determined from 1D field-aligned characteristics and a 'radiative energy balance' condition. The 2.5D nature of this code makes it ideal for parameter studies not yet possible with 3D codes. We have made this code publicly available as a tool for the community. To this end we have developed a graphical interface to aid in the selection of appropriate options and a graphical interface that can process and visualize the data produced by the simulation. As an example, we show a simulation of a dipole field stretched into a helmet streamer by the solar wind. Plasmoids periodically erupt from the streamer, and we perform a parameter study of how the frequency and location of these eruptions changed in response to different levels of coronal heating. As a further example, we show the solar wind stretching a compact multi-polar flux system. This flux system will be used to study breakout coronal mass ejections in the presence of the solar wind.

014001

, , , , and

Pythran is an open source static compiler that turns modules written in a subset of Python language into native ones. Assuming that scientific modules do not rely much on the dynamic features of the language, it trades them for powerful, possibly inter-procedural, optimizations. These optimizations include detection of pure functions, temporary allocation removal, constant folding, Numpy ufunc fusion and parallelization, explicit thread-level parallelism through OpenMP annotations, false variable polymorphism pruning, and automatic vector instruction generation such as AVX or SSE. In addition to these compilation steps, Pythran provides a C++ runtime library that leverages the C++ STL to provide generic containers, and the Numeric Template Toolbox for Numpy support. It takes advantage of modern C++11 features such as variadic templates, type inference, move semantics and perfect forwarding, as well as classical idioms such as expression templates. Unlike the Cython approach, Pythran input code remains compatible with the Python interpreter. Output code is generally as efficient as the annotated Cython equivalent, if not more, but without the backward compatibility loss.

015001

, and

We consider the inversion of block tridiagonal, block Toeplitz matrices and comment on the behaviour of these inverses as one moves away from the diagonal. Using matrix Möbius transformations, we first present an ${\rm O}(1)$ representation (with respect to the number of block rows and block columns) for the inverse matrix and subsequently use this representation to characterize the inverse matrix. There are four symmetry-distinct cases where the blocks of the inverse matrix (i) decay to zero on both sides of the diagonal, (ii) oscillate on both sides, (iii) decay on one side and oscillate on the other and (iv) decay on one side and grow on the other. This characterization exposes the necessary conditions for the inverse matrix to be numerically banded and may also aid in the design of preconditioners and fast algorithms. Finally, we present numerical examples of these matrix types.