The Potsdam open source radio interferometry tool (Port)

source


Introduction and Motivation
As one component of the four space geodetic techniques, very long baseline interferometry (VLBI) significantly contributes to our understanding of system Earth (Sovers et al. 1998;Schuh & Behrend 2012;Schuh & Böhm 2013).The retrieval of VLBI data products consists of multiple analysis steps 12 beginning with the raw digitized signal from an extragalactic radio source recorded by the VLBI antennas and leading to the consolidated geodetic and astrometric parameters produced by the International VLBI Service for Geodesy and Astrometry (IVS, Nothnagel et al. 2017) combination center; a sketch of these analysis steps is shown in Figure 1.Here, we focus on the estimation of geodetic and astrometric parameters based on multi-band group delay observables.

Short History of the Software
The development of source code for scientific applications is very often among the tasks of scientists and students but it does not get credited very well unless an article is written about the software itself.The Potsdam Open-Source Radio Interferometry Tool, or short PORT, is the software used and developed for geodetic and astrometric VLBI analyses, scheduling of observations, and simulation of observations by scientists and students that either currently work at GeoForschungszentrum Potsdam (GFZ) or used to work there.Its historical roots go back to the Bonn VLBI Software System (BVSS) developed among others by James Campbell (Campbell 1979) and Harald Schuh (HS) at the Geodetic Institute of the University of Bonn since 1979 (Schuh 1987).This code was then used and further developed Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. by other authors, e.g., Nestor Zarraoa (Zarraoa et al. 1990(Zarraoa et al. , 1992;;Zarraoa 1992) as the basis for the later OCCAM software.Since about 1993, Oleg Titov (Titov & Zarraoa 1997;Titov 2000) was the chairman of the OCCAM VLBI software that was soon used by several groups e.g., in St. Petersburg, Russia and in Munich, Germany.In Munich, OCCAM was further developed by Volker Tesmer and Robert Heinkelmann at Deutsches Geodätisches Forschungsinstitut (DGFI), now DGFI-TUM.Tesmer and Heinkelmann extended in particular the stochastic model in terms of outlier detection and treatment as well as variance component analysis for groups of parameters (Heinkelmann 2002;Kutterer et al. 2003;Tesmer 2004).Additionally, an interface was created between the least squares module of OCCAM and DOGS-CS (DGFI Orbit and Geodetic parameter estimation Software-Combination and Solution) in order to enable the solution of global parameters.At the same time, Titov further developed the Kalman Filter module of OCCAM (Titov et al. 2004).With HS, the VLBI analysis software OCCAM came to Vienna where it was further improved, modernized and finally re-designed in MATLAB ® (MathWorks ® ) under HS renamed to VieVS (Böhm et al. 2012(Böhm et al. , 2018)).In 2012 when HS left from Vienna University of Technology to GFZ, the VLBI analysis knowledge and software was brought to its current place: Potsdam.In the last several years since 2012 the code was extensively used and edited by a variety of scientists, mainly by Tobias Nilsson (TN), within the VLBI Working Group at GFZ under Harald Schuh and Robert Heinkelmann (RH).With more and more VLBI developers and analysts at different institutes around the world cooperating with the GFZ VLBI Working Group and its software, it became obvious that a higher degree of code organization would be mandatory.That was when PORT was initialized.
A key element of the level 2 VLBI data analysis (Figure 1) is the detection and flagging of outlier observations (see Section 3.4).Since outlier identification by manual inspection contains by nature subjective elements, the VLBI community addresses this issue by employing a collection of analysis tools, which implement diverse algorithms and methods.Thus, any newcomer to this toolset is highly appreciated by the IVS, as it tends to reduce the weights of solutions derived by existing analysis softwares.Key objectives of PORT development focus on 1. the timely processing of VLBI sessions, 2. post-processing activities in support of the realization of terrestrial and celestial reference systems without relying on legacy data analysis packages, 3. supporting the data analyst in interactive tasks through visualization tools and suitable user interfaces, 4. providing tools for observation scheduling and simulation, 5. providing a framework for VLBI research and development projects, 6. applying advanced models, and 7. facilitating the education of young researchers in VLBI data analysis.
Two documents describe in detail the PORT installation procedure and provide recipes on how to analyze IVS data using PORT. 13In terms of processing speed, a VLBI analysis software implemented in a fourth generation programming language such as MATLAB ® is clearly outperformed by any software package written in Fortran or C. Still, with pre-determined outliers, PORT manages to completely process the current total set of about 6500 VLBI sessions in about 40 hr on a current hardware (Intel ® Xeon ® Gold 5222 CPU with 16 cores, 264 GByte memory); the processing time increases however by at least a factor of two if outlier detection without preexisting information is activated.

Discussion of Processing Stages
The PORT software addresses the level 2 data analysis step of the VLBI data processing pipeline (see Figure 1).It reads in ambiguity-resolved group delay time series, that are calibrated to be approximately free of ionospheric delay effects.PORT estimates station and radio source coordinates, as well as Earth orientation parameters (EOP) and their rates.In addition, zenith delays and horizontal atmospheric gradients at the antenna locations, and instrumental effects such as clock offsets and drifts and baseline clock offsets are determined.

Input Data
The fundamental input data sources feeding the PORT processing chain (level 2 data analysis) are time series of group delays stored in the vgosDB14 or legacy NGS15 data formats.In addition to the delay information, meteorological data, such as temperature, pressure, and relative humidity, provided by sensors at the antenna sites, as well as hardware-related calibration data are included in the vgosDB/NGS file(s).Furthermore, PORT collects a priori EOP time series from the International Earth Rotation and Reference Systems Service (IERS) data archives and calculates reference point displacements at a station site.Atmospheric refraction causes delay and bending, which are considered in PORT as well.Mapping functions and atmospheric parameter models consider both effects at the same time.The a priori zenith delays are calculated using the Saastamoinen refraction model (Petit & Luzum 2010) inserting meteorological data from either in situ observations or from meteorological analysis fields of ECMWF.The a priori horizontal atmospheric gradients are calculated from the APG model (Böhm et al. 2013).Finally, general and sessionspecific processing options and runtime control variables are initialized from ASCII text files using a PORT-specific data format.
If information on outlier data points, stored in a separate ASCII text file, is available, the corresponding observations are flagged and excluded from the subsequent analysis tasks.
Besides the functional code for VLBI data analysis, PORT includes code for operational, administrative and format conversion tasks.Monitoring of the IVS archives for the publication of novel or revised vgosDB/NGS data files is handled by a set of Python scripts.The scripts are intended to run regularly at predefined intervals.If new observations are found, the operator on duty is alerted by e-mail and optionally in terms of SMS messages to analyze the session.The operational processing toolkit of PORT includes a database which holds information on the analysis status of each individual VLBI session.(Charlot et al. 2020) source coordinates (other reference frames may be chosen as well), loading models, atmospheric mapping functions (Dill & Dobslaw 2013;Balidakis et al. 2018), and relativistic corrections according to the consensus model (Eubanks 1991;Titov et al. 2011) following the Conventions by the IERS (Petit & Luzum 2010).The JPL solar system ephemerides listed in Table 1 are supported out of the box; ephemerides format-compatible with the JPL ephemerides may be used as well.The observables are assumed to be uncorrelated and semi-empirically determined weights are applied.

Functional Model
In addition, the design matrix entries, consisting of the partial derivatives of the observables with respect to each parameter, are calculated.The derivatives are taken at the parameters' a priori values, since in geodetic VLBI the adjustments are assumed to be less than about 4 mas for radio source coordinates and less than about 2 m for station coordinates and thus the linear approximation of the functional model suffices for the desired accuracy (Eubanks 1991).

Interactive Analysis Mode I: Review of Auxiliary Information
Before starting with the parameter estimation the user is advised to check the correlator report and auxiliary stationspecific information.PORT supports these preprocessing tasks, first, by downloading the correlator report (if it exists) and by storing it in the PORT data directory and, second, by providing plots of meteorological data from sensors at the individual stations as well as phase calibration information recorded during the 24 hr measurement period.
Figures 2 and 3 show examples of these auxiliary data recorded at station ONSALA60 during session 14MAY10XA.Together with the correlator report, these plots are supposed to guide the analyst in his/her decisions during the subsequent analysis.

Parameter Estimation
The parameter estimation process starts by determining the difference between the observed and the modeled delay values -the reduced observation vector.All parameters are realized as linear splines (piecewise linear functions).The reference epoch is the 0:00 UTC within the session (24 hr sessions) or closest to the observation epochs (CONT sessions, intensive sessions) in order to be compatible with other space geodetic techniques parameter epochs, such as the ones from GNSS.The length of the spline segments is constant inside a session for each parameter and can be chosen by the analyst.Depending on the chosen segment length several parameters will be outputted by PORT until the first and the last epoch of the observations are covered within a parameter segment.Occasionally, stations do not observe for some time during a session owing to other commitments, for example, the participation in an Intensive session, unforgiving weather, or technical issues.To avoid the noise increase associated with   setting up unknowns that are not supported by observations, PORT is capable of refraining from doing so.
For each EOP two parameters are estimated as part of the normal equation system, one at the epoch of the first scan, the other at the epoch of the last scan of the session.Thus, PORT allows to output EOP offsets as well as their rates.Optionally, EOP estimation at midnight epochs may be set up to facilitate the comparison with other space geodetic techniques.For clock parameters, 1 hourly segments are chosen.Accordingly, PORT outputs 25 parameters for (n clk − 1) clocks in the network.One value at 0:00 between day 1 and day 2 and then forward and backward in time every integer hour until no more observation epochs are outside.The associated constraints are added in terms of pseudo-observations either by constraining a parameter to its a priori value with a certain weight (absolute constraint) or by constraining the rate of change of a parameter with a certain weight (relative constraint).If just one adjusted parameter for the entire session is targeted, tight absolute constraints can be applied.Depending on the type of the parameter and its anticipated fluctuation per time, the segment lengths listed in Table 2 are typically applied.In addition, the table lists the corresponding absolute and relative constraints, which have been obtained empirically.For ITRF2020, a set of specific piecewise linear functions intervals for clock and atmospheric delay estimation was used for each session, depending on the observation geometry.For stations with poor observation geometry, fewer knots were set up.
At present, efforts are being made to incorporate VieVS@GFZ's Kalman Filter code (Nilsson et al. 2015;Soja et al. 2015Soja et al. , 2016;;Soja 2016) into PORT as well.In the current PORT version the parameters are estimated with least squares adjustment (Koch 1999;Niemeier 2001;Teke et al. 2012).The adjustment is performed in two iterations.The parameters estimated in the first solution are clock polynomials of arbitrary degree (usually not higher than second) for each station-except for one station.The coefficients of this reference station are defined to be zero and, hence, this clock serves as the reference for the other clocks in the network.The clock adjustments may exceed hundreds of microseconds, in extreme cases even one second.In addition, harmonic clock variations, baseline clock offsets and one zenith delay parameter per station are estimated, their contributions are removed and the final adjustment estimates the remaining parameters including again clock offsets, drifts and drift rates.
In Table 2 the parameters estimated by PORT during single session analysis are listed.Apart from EOP, station positions, radio source coordinates, and tropospheric delay parameters (Balidakis et al. 2018), station clock offsets and drifts may be determined.In addition, PORT estimates baseline clock offsets, constant delays on specific baselines, which absorb instrumental effects (e.g., due to failure of individual frequency channels because of radio-frequency interference) on the corresponding baseline.The complete mathematical model of VLBI data analysis is for example described in Heinkelmann (2008).

Interactive Analysis Mode II: Monitoring of Residuals and Their Statistics
The estimation results are presented to the analyst in several ways.First, following the approach familiar from the VieVS software package (Böhm et al. 2012(Böhm et al. , 2018)), the residuals v as well as their uncertainties are displayed as time series on screen for visual inspection.Here, A denotes the design matrix, x the adjustments for the unknown parameters, and l the reduced observation vector.The user may choose to restrict the plotted data by selecting individual baselines, stations, or radio sources.
Figures 4 and 5 illustrate the interactive procedure.Initially, the analyst can check the residuals from the first estimation stage (see Section 3.4) for large clock breaks.Representing residuals stationwise allows the user to identify systematic errors related to hardware issues at the selected station.For example, the time series of residuals obtained from the analysis of session 18OCT01XA at station WETTZELL, shown in Figure 4, exhibits a discontinuity of about 1.7 × 10 9 cm, corresponding to a clock jump of about 55 ms at 58393.6294MJD (2018 October 2 15:06:20 UTC).Using the mouse pointer a marker (red line) is placed at that time to insert a clock break.Every time a clock break is added at a given station, the coefficients of the affected clock polynomial are estimated anew.After repeating the estimation cycle, the residuals from the final estimation stage are inspected and outlier events are flagged, see Figure 5.
Second, to support the analyst in optimizing station-, baseline-, and/or radio source-based weights or even to removing a particular station altogether, a table detailing the goodness of the fit (χ 2 ), the weighted root mean square (rms) residual deviations, the weighted mean values, as well as the (effective) number of observations per station, baseline, and radio source is produced and printed on screen.Figure 6 displays an example for this type of table obtained as a result of processing session 06MAR23XE using PORT's current runtime option and processing control settings, denoted as data analysis version "2020b."These settings include information on, among others, selected models (see Table 1), outlier removal methods, elevation cut-off angles, vgosDB quality control flags, threshold values of the minimum number of stations, sources and/ or baselines required, parameterizations of the station clock errors, application of constraints (pseudo-observations), and additional uncertainties to be added to the a priori measurement uncertainties.In the first part the station-and baseline-specific statistics are listed, including the goodness of the fit, in the following denoted "chisquared" (and also "chi∧2" in Figure 6), T 2 par the weighted mean value of the residuals T T the effective numbers of observations denoted "n_eff" in Figure 6, and the Student t-distribution (Student 1908), in Figure 6 listed as "t_cdf."Here, n denotes the actual number of observations (listed as "N" in Figure 6), n par the number of parameters, P the weight matrix for the observations, e ≡ (1, 1,K,1) T , and ( ) P max is the largest element of P.These statistics guide the analyst in the decision process which station to down-or up-weight and/or whether baseline clock offsets need to be admitted.In the second part of the table the resulting adjustments for station positions, radio source positions, and Earth orientation parameters are listed.
On the basis of the information described above, the analyst can enter or modify corresponding weights, activate baseline offsets, and remove/deselect stations and/or radio sources.A weighted Student t-test triggers a warning message if mean observable residuals at individual stations, baselines, and/or radio sources significantly differ from the value expected on the basis of the null hypothesis and exhibit statistical anomalies (see, e.g., Bevington 1992;Press et al. 1992).Based on this information the estimation/editing/review procedure is iterated until acceptable overall goodness of fit (typically, 0.1 < χ 2 < 10) and weighted residual rms values are achieved.

Automatic Analysis Mode
Besides the interactive mode discussed in Sections 3.3 and 3.5, PORT supports a non-interactive batch mode.Sessions without (or with already known) clock breaks may be processed in this unsupervised automatic mode where outliers are identified using Baarda's data snooping method, an outlier detection based on statistical tests (Baarda 1967(Baarda , 1968;;Rofatto et al. 2020).For clarity it is emphasized that in automatic mode-just as in interactive mode-PORT estimates parameters and assembles normal equations as well.The speedy analysis of large number of sessions enabled by automatic mode allows for efficient tests and intercomparisons of diverse models (Table 1).

Output Data
Once the analyst has reviewed the residuals and verified their statistics, the estimated parameters, the a priori information employed during the estimation procedure, as well as the normal equation systems prepared for the global least squares adjustment are exported, stored in the Solution INdependent EXchange (SINEX) format. 16Finally, the user has the option to upload the SINEX file to the IVS data servers using a provided Python script.In addition, these data are written to disk in PORT-specific binary file format for subsequent processing.Another set of output files is generated from the selected options and outlier.17

Estimation Statistics
Verification and validation of the estimated parameters are an important aspect of geodetic data analysis and one of the key PORT development objectives is to provide the analyst with suitable tools to visualize the results.Furthermore, PORT supports the intercomparison with the analysis results provided by other IVS analysis centers; its quality control tool set allows for the creation of graphical representations of deviations between two or more SINEX-formatted solutions.The comparisons include overall statistics such as goodness of fits and weighted residual rms, and subset statistics of individual EOP variables,  stations, or radio sources.For ease of access the images are displayed through a set of static HTML pages.
Figures 7 and 8 exemplarily illustrate statistical results for GFZ's VLBI contribution to ITRF202018 (K.Balidakis et al. 2021, in preparation).Figure 7 displays the temporal evolution of weighted rms corresponding to the PORT analysis of more than 6500 sessions recorded between 1979 August and 2020 December.The histogram of the deviations between the absolute value of UT1−UTC extracted from PORT's "gfz2020b" and NASA Goddard's "gsf2019a" solutions are plotted in Figure 8.While perfect agreement has not yet been achieved, the two data sets appear to be consistent on the 30-100 μs level.

Summary, Conclusions and Outlook
The VLBI analysis software Potsdam Open Source Radio Interferometry Tool (PORT) branched off from the VieVS@GFZ and VieVS tools, which in turn trace their origin to the BVSS and OCCAM software packages.PORT is offered by GFZ German Research Centre for Geosciences as a contribution to the community in order to enhance and extend the existing selection of well-established analysis packages.Starting from group delays, PORT provides estimates of station positions, radio source positions, Earth orientation parameters, and atmospheric parameters.The estimation procedures support the conventional analysis options for the contribution to ITRF2020 (K.Balidakis et al. 2021, in preparation).The operational analysis itself is preceded by preprocessing tasks performed by a set of Python scripts.PORT performs the level 2 data analysis in two iterations.If outlier and option information already exists, automatic mode processing may be selected; otherwise, the analysis can be performed in interactive mode.Outliers and options are saved to disk, estimation results are stored in SINEX files.The statistics of the overall residuals, as well as the subset belonging to specific stations, baselines, and/or radio sources, are provided to support analyst's decisions on flagging or reweighting.Quality control tools are provided to compare multi-session data sets.Present development efforts focus on the implementation of ambiguity resolution for phase delays and the incorporation of a Kalman Filter functionality.The design and implementation of data structures, as well as internal and external interfaces of the existing Matlab® code base, are being improved and documented in order to achieve better robustness, modularity and execution efficiency.Special emphasis will be placed on utilizing MATLAB ® ʼs advanced visualization capabilities.For PORT system requirements, software availability, and license information see the Appendix.
PORT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.See the GNU General Public License (https://www.gnu.org/licenses/gpl-3. 0.html) for more details.

Figure 1 .
Figure1.Schematic of VLBI data analysis steps.PORT performs the estimation of geodetic and astrometric parameters from group delay observables (red shaded area, level 2 data analysis).

Figure 2 .
Figure2.Data recorded at station ONSALA60 during session 14MAY10XA, part 1. Starting from the top left panel in clockwise direction the individual panels show a sky plot of the observed radio sources, the number of group delays per baseline with the corresponding quality-controlled percentages, the delay induced by the effective cable length variation and determined by the cable calibration electronics in units of picoseconds, the relative humidity RH in percent, the ambient temperature T in degrees celsius, and the atmospheric surface pressure P in hectopascal as functions of time of the day in hours.The numbers listed below the third, fourth, fifth and sixth panels denote the mean value (m1), the standard deviation (m2), the median absolute deviation (MAD), and the peak-to-peak variations (P2P).

Figure 3 .
Figure3.Data recorded at station ONSALA60 during session 14MAY10XA, part 2. The individual panels show (from left to right) group delay τ in units of picoseconds, phase delay Φ in degrees and amplitude A (arbitrary units) of the phase calibration signal.The top and bottom rows refer to the S band and X band frequency range, respectively.In all panels the horizontal axis corresponds to time since the start of the observation.As in Figure2the numbers below the six panels denote the mean value (m1), the standard deviation (m2), the median absolute deviation (MAD), and the peak-to-peak variations (P2P).

Figure 4 .
Figure 4. Clock break detection and identification.For the 24 hr session 18OCT01XA the user interface displays residuals for all baselines involving station WETTZELL.At 58393.6294MJD a clock break of about 55 ms is evident (note the units of 10 9 cm (!) for the vertical axis).When flagged by the analyst, the break event is marked by a red line.

Figure 5 .
Figure5.Outlier detection and flagging.During 24 hr session 01FEB19XF a significant number of observations are identified as outlier events.The corresponding data points are selected using the mouse pointer, flagged (marked as crosses in the display) and excluded from the subsequent analysis.

Figure 6 .
Figure6.Estimation results and their statistics obtained from the analysis of 24 hr session 06MAR23XE.Note the change of unit in the last column of "DUT1," the deviation of UT1 from UTC.For brevity the table has been edited and abridged.

Figure 7 .
Figure 7. PORT statistics of weighted residual rms as a function of observation time.The results are extracted from SINEX files prepared for GFZ's VLBI contribution to ITRF2020.
Table 1 lists the most relevant components of the functional model.The calculation takes into account ITRF2014 (Altamimi et al. 2016) station and ICRF3

Table 1
Models Implemented and Available in PORT

Table 2
Parameters Estimated by PORT from a 24 hr Session (Single Session Analysis) Publications of the Astronomical Society of the Pacific, 133:104503 (13pp), 2021 October Schuh et al.