The Validation and Transformation Language (VTL) is a standard language for defining validation and transformation rules (set of operators, their syntax and semantics) for any kind of statistical data. VTL has been developed by a specific Task Force who reports to the SDMX Secretariat, composed of experts from a dozen different international organizations, including central banks, national statistical offices, and private companies. Although under the umbrella of SDMX governance, VTL is designed as a language based on an independent information model (IM), made of the basic artefacts common to the main statistical standards.

Main characteristics

  • Technology agnostic: the language is designed to manipulate statistical data at a conceptual level, independently of the physical representation used to store or exchange the data observations.
  • User friendly: The language is intuitive and friendly, being designed for users without information technology skills and mathematics based, with a simple, intuitive and self-explanatory syntax, minimal redundancies and oriented for validations and transformation of statistical data; operators are provided for validations and editing, aggregation (even according to hierarchies), dimensional processing (e.g. projection, filter), and aggregate statistical measures (mean, median, variance).
  • Data independent: the language can be used for any type of statistical data (dimensional data, survey data, registers data, micro and macro, quantitative and qualitative) since its information model covers all these typologies.
  • Process consistent: the language is usable in any of the phases of the statistical process and is usable in any one of them since both collected and calculated data are represented in a homogeneous way regarding the metadata needed for calculations and are equally permitted as inputs of a calculation, without changes in the syntax of the operators/expression.
  • Interoperable: the language can be applied to various standards, relying on a specific information model whose artefacts could be in an unambiguous correspondence with artefacts of other information models (e.g. SDMX) which refer to the same mathematical notion. To achieve an unambiguous mapping, the VTL Information Model is deeply inspired by GSIM’s Information Model and uses the same artefacts when possible; GSIM artefacts are supplemented, when needed, with other artefacts that are necessary for describing calculations (in particular, the SDMX model for Transformations is used).
  • Robust and self-documented: a formal grammar described by Extended Backus Naur Form notation (EBNF) is provided to parse VTL expressions. The inputs and the outputs of the expressions and the calculations themselves are artefacts of the Information Model, so calculated data can be operands of further calculations and their data structures are deducible from the calculation algorithm and from the data structure of the operands. Data lineage is consequently automatically documented and aligned.

VTL Information Model

The VTL Information Model (VTL IM), based on GSIM and having mathematical foundations, can easily map and be mapped into other data models to be used in many steps of the statistical process with different standards (currently SDMX, DDI, XBRL/DPM).

In the VTL Information Model the various types of statistical data are considered as mathematical functions, described as having “independent” variables (meaning that they have the role of Identifiers) and “dependent” variables (meaning that they have the role of Measures or Attributes), whose extensions can be thought of as logical tables (i.e. Data Sets). VTL would describe a typical data table as having columns that are either Identifiers, Measures or Attributes, while a Data Point would comprise an entire row of that table (rather than just a single cell). Therefore, the main artefacts to be manipulated using VTL are the logical Data Sets.

VTL Documentation

The latest published version of VTL, officially released in August 2024, is version 2.1, with a package comprising the following material:

  1. User manual, highlighting the main characteristics of VTL, its key assumptions and the Information Model the language is based on.
  2. Reference manual, describing the full library of operators, ordered by category, with examples.
  3. VTL grammar, expressed in EBNF notation (Extended Backus-Naur Form), which is the technical notation to be used to parse VTL expressions (G4 file).
  4. Technical Notes document, to support VTL implementation.

As of December 2024, the complete documentation is available in mark-up format, published in the official VTL GitHub Repository: https://sdmx-twg.github.io/vtl/2.1/html/  

VTL tools

Several tools providing editing and calculation features for VTL are currently available. Their engines, which are open and free, use a variety of technologies and programming languages (including Java, Python, R, and SQL).

VTL tools can be found in Software Tools for SDMX Implementers and Developers.