GNOME 3637 Published by

Nick Wellnhofer has announced the release of Libxml2 2.14.0, featuring significant updates including the alignment of the HTML tokenizer with HTML5 standards, the elimination of non-standard syntax warnings, and the limitation of binary compatibility to versions 2.14 or later.

The serialization API now considers user-provided or default encodings when serializing attribute values, aligning the serialization of text and eliminating unnecessary escaping. A new configuration option now allows for the independent disabling of support for RELAX NG, separate from XML Schemas support. The "legacy" configuration option will no longer provide support for HTTP and LZMA, as these features will be removed in the upcoming release. The latest enhancements comprise input callbacks, an upgraded API for generating parser input, and a dedicated API function for installing a custom character encoding converter. Deprecations encompass the accessibility of numerous public struct members and additional internal functions. The removals encompass metadata related to the HTML4 content model, the FTP module, the xpointer() scheme, legacy symbols, ELF version information, shell relocation, the libxml.m4 file, and the method for detecting single-threaded programs under glibc. The removal of support for HTTP and LZMA compression is scheduled for the 2.15 release.





Libxml2 2.14.0 released

https://download.gnome.org/sources/libxml2/2.14/libxml2-2.14.0.tar.xz
sha256sum: 3e2ed89d81d210322d70b35460166d4ea285e5bb017576972a1d76a09631985c

Screenshot_from_2024_07_25_08_23_22

Major changes

The HTML tokenizer now conforms fully to HTML5. Several non-standard syntax warnings were removed. Note that HTML5 tree construction isn’t implemented yet.

Binary compatibility is restricted to versions 2.14 or newer. On ELF systems, the soname was bumped from libxml2.so.2 to libxml2.so.16.

The serialization API will now take user-provided or default encodings into account when serializing attribute values, matching the serialization of text and avoiding unnecessary escaping. The XML parser won’t try to merge consecutive CDATA sections as before

to align with web standards. Each CDATA section will create exactly one node or SAX callback.

Support for RELAX NG can now be disabled with a new configuration option independently of XML Schemas support. It is still enabled by default.

The “legacy” configuration option won’t enable support for HTTP and LZMA anymore. These features will be removed in the next release.

Parts of the xmllint executable were refactored, allowing the combination of more options. OOM errors should be reported reliably now.

Several improvements were made to the build systems. Meson is fully supported now.

Parts of the buffering code were reworked and simplified.

Overflow checks before reallocations were hardenend.

Some unprefixed symbols were renamed to avoid namespace pollution.

New features

Input callbacks can now be set on a parser context and an improved API to create parser input is available. The following new functions, taking a parser input object, were added:

  • xmlCtxtParseDocument
  • xmlCtxtParseContent as replacement for xmlParseBalancedChunkMemory and xmlParseInNodeContext
  • xmlCtxtParseDtd

The xmlSave API now has additional options to replace global settings.

Parser options XML_PARSE_UNZIP, XML_PARSE_NO_SYS_CATALOG and XML_PARSE_CATALOG_PI were added.

An API function to install a custom character encoding converter is now available. This makes it possible to use ICU for encoding conversion even if libxml2 was compiled without ICU support, see example/icu.c.

Deprecations

Access to many public struct members is now deprecated. Several accessor functions were added to use instead.

More internal functions were deprecated.

Removals

Metadata about the HTML4 content model was removed from the htmlElemDesc struct and related functions were deprecated.

The FTP module and related functions were removed.

Support for the range and point extensions of the xpointer() scheme was removed. The rest of the XPointer implementation isn’t affected. The xpointer() scheme now behaves like the xpath1() scheme.

Several legacy symbols and the functions in xmlunicode.h were removed.

ELF version information was removed.

The shell was moved from libxml2 to xmllint. Several related functions are no longer available.

The libxml.m4 file containing autoconf macros was removed.

The --with-tree configuration option was removed.

The hack to detect single-threaded programs under glibc was removed.

Planned removals

Support for HTTP and LZMA compression is planned to be removed in the 2.15 release.

The following features are considered for removal:

  • Modules API (xmlmodule.h)
  • Schematron support
  • Support for zlib compressed file I/O
  • Legacy Windows build system in win32

RELAX NG support is still in a bad state and a long-term removal candidate.

Thanks

Thanks to the following contributors:

  • Andrew Potter
  • Benjamin Gilbert
  • Chun-wei Fan
  • correctmost
  • Daniel Cheng
  • Daniel E
  • Florin Haja
  • Grzegorz Szymaszek
  • Heiko Becker
  • Himanshibansal
  • Jan Alexander Steffens (heftig)
  • Kjell Ahlstedt
  • makise-homura
  • Markus Rickert
  • Mike Dalessio
  • Miklos Vajna
  • Rosen Penev
  • Ruslan Garipov
  • Ryan Carsten Schmidt
  • Saleem Abdulrasool
  • Sam James
  • Satadru Pramanik
  • Taylor R Campbell
  • triallax
  • Yegor Yefremov
  • Zak Ridouh

Libxml2 2.14.0 released