Itextsharp Convert Pdf To Xml
Let's see how to add 'PDF to XML feature' into any .NET application. First of all, to give your .NET application ability to convert PDF documents to XML, add a reference to the 'SautinSoft.PdfFocus.dll' assembly. You may download it here, 104.0 Mb.
I don't believe in converting PDF into other formats (unless you're talking about rendering PDF to a raster format). IText can do a best effort to extract a PDF to text, and if the PDF is 'tagged', it can convert the PDF to XML, but I don't trust any software that claims it can convert PDF to Word, Excel, RTF, HTML.
In 2011, iText Group released XML Worker as a generic XML to PDF tool, built on top of iText 5. A default implementation converted XHTML (data) and CSS (styles) to PDF, mapping HTML tags such as, and to iText 5 objects such as Paragraph, Image, and ListItem. Here Mudassar Ahmed Khan has explained with an example, how to use the iTextSharp HTML to PDF conversion library in ASP.Net MVC Razor. First the data will be populated from database using Entity Framework and then the records from the database will be displayed as HTML in ASP.Net MVC Razor. Then the same HTML will be converted to PDF file using the iTextSharp HTML to PDF conversion library. PdfHtml is an iText 7 add on. This add on will allow you to easily convert HTML to PDF or iText objects. The pdfHtml Community source code is hosted on Github, where you can also download the latest releases. You can also build pdfHtml Community from source. Call iTextSharp's HTMLWorker.ParseToList method, passing in the HTML to convert into PDF. This returns a collection of elements. Add each element returned in Step 3 to the Document object; Steps 1 and 2 are identical to the first two steps for creating a PDF document from scratch.
Let's take a look to a very straightforward example in C#:

After launching this code you will get XML-document produced from Table.pdf. Since we have set the property 'ConvertNonTabularDataToSpreadsheet' to false, all textual data will be skipped. In other words, only tables will be converted to XML.
Thus, you may adjust the component to get such XML document as you wish.
Itextsharp Convert Pdf To Xml Free
Download

To see this functionality firsthand, download the freshest «PDF Focus .Net» with code examples, 104.0 Mb.
Limitations
PDF Focus .Net The limitations of the free version are: The trial notice 'Created by unlicensed version of PDF Focus .Net' and the random addition of the word 'TRIAL'.
Some examples to convert PDF to XML in C# and VB.Net
1. Convert PDF file to XML file in C#:
2. Convert PDF file to XML file in VB.Net:
Requires .NET Framework 4.0 or higher. Our product is compatible with all .NET languages and supports all Operating Systems where .NET Framework and .NET Core can be used. Note that PDF Focus .Net is entirely written in managed C#, which makes it absolutely standalone and an independent library.
.NET Framework 4.0, 4.5, 4.6.1 and higher.The old version for old .NET 2.0 can be found here
.NET Standard 2.0
.NET Core 2.0 and higher.
Multi-platform component, runs on:
Convert Html To Pdf Using Itextsharp Xmlworker
Our component has proven itself on cloud platforms and services:
C# Itextsharp Convert Pdf To Xml
- Microsoft Azure
- Amazon Web Services (AWS)
- Google Cloud Platform
- SharePoint
- Docker
- etc.