pdf links

PDF Rendering
Convert PDF to Image (.NET)
Convert PDF to image on Android (Xamarin)
Convert PDF to image on iOS (Xamarin)
Convert PDF to image in Windows Store apps (.NET)
Convert PDF to image in Windows Phone apps (.NET)
PDF to image in Universal Windows Store apps (.NET)
Free PDF Viewer control for Windows Forms (.NET)
How to integrate PDF Viewer control in WPF app (.NET)
Creating WPF PDF Viewer supporting bookmarks (.NET)
Cross-platform PDF Viewer using GTK# (MONO)
Silverlight PDF viewer control (Silverlight 5)
Multithreaded PDF rendering (.NET)
Convert pdf to image in Silverlight app (C# sample)
How to set fallback fonts for PDF rendering (C#)
Avoiding the out-of-memory exception on rendering (C#)
PDF viewer single page application (WebAPI, AngularJS)
PDF viewer control for Windows 10 universal applications
Use custom ICC profile for CMYK to RGB conversion
PDF layers - separate images, text, annotations, graphics

PDF Forms Creation PDF Security
Conversion to PDF/A
Other topics
PDF Document Manipulation
PDF Content Generation
Fixed and Flow layout document API (.NET)
Creation of grids and tables in PDF (C# sample)
How to create interactive documents using Actions (C# sample)
Text flow effects in PDF (C# sample)
How to generate ordered and bulleted lists in PDF (C# sample)
Convert HTML to PDF using flow layout API (C# sample)
How to use custom fonts for PDF generation (.NET)
Create document with differently sized pages (C#)
Create PDF documents using MONO (C#/MONO/Windows/OSX)
How to use background images for content elements (C#/PDF Kit/FlowLayout)
Add transparent images to PDF document (C#)
Draw round rect borders in PDF documents(C#)
ICC color profiles and and ICC based colors in PDF (C#)
How to use bidirectional and right to left text in PDF (C#)
Create PDF documents from XML templates (C# sample)
How to resize PDF pages and use custom stamps (C#)
Add header and footer to PDF page (.NET sample)
How to use clipping mask for drawing on PDF page
Fill graphics path with gradient brushes in PDF (Shadings)
Apitron PDF Kit and Rasterizer engine settings
Add layers to PDF page (optional content, C# sample)
How to create free text annotation with custom appearance

PDF Content Extraction
PDF Navigation

PDF to TIFF conversion
Contact us if you have a PDF related question and we'll cover it in our blog.

2016-02-22

Search text in PDF documents using regular expressions

Introduction


Searching text in PDF document is easy and this feature became available to users of our Apitron PDF Rasterizer for .NET component many releases ago. Now we’ve updated the API and you can search for text on PDF page using standard .NET regular expression objects (Regex).

Text search API offered by Apitron PDF Rasterizer is decoupled from the rendering part and can be used independently. It’s represented by the SearchIndex class that handles all search tasks and offers very useful features like building search indices for the documents, and saving/loading of such indices for the later use.

Using search API offered by Apitron PDF Rasterizer you can also highlight text on rendered pages because you get all necessary information about text position on PDF page.

See the code section for details.

The code


class Program
{
    // global rendering settings
    static RenderingSettings renderingSettings = new RenderingSettings();
    // hightlight brush for search results
    static Brush hightlightBrush = new SolidBrush(Color.FromArgb(100,255,255,0));

    static void Main(string[] args)
    {
        // the source file to search the text into
        string inputFilePath = "../../data/Apitron_Pdf_Kit_in_Action.pdf";           

        // open pdf document for search and rendering
        // we'll use 2 different streams here
        using (Stream searchStream = new FileStream(inputFilePath, FileMode.Open,
            FileAccess.Read),
            documentStream = new FileStream(inputFilePath, FileMode.Open,
            FileAccess.Read))
        {               
            // create search object from PDF data stream
            using (SearchIndex searchIndex = new SearchIndex(searchStream))
            {
                // open document to be used for rendering
                using (Document doc = new Document(documentStream))
                {
                    searchIndex.Search((handlerArgs =>
                    {
                        // if we have results
                        if (handlerArgs.ResultItems.Count != 0)
                        {
                            // create resulting image filename
                            string outputFileName = string.Format("{0}_{1}.png",
                                Path.GetFileNameWithoutExtension(inputFilePath),
                                handlerArgs.PageIndex);

                            // render found result and start system image viewer
                            Page page = doc.Pages[handlerArgs.PageIndex];
                            using (Image bitmap = page.Render(new Resolution(96, 96),
                                renderingSettings))
                            {
                                foreach (SearchResultItem searchResultItem in
                                    handlerArgs.ResultItems)
                                {
                                    HighlightSearchResult(bitmap, searchResultItem,
                                    page);
                                }

                                bitmap.Save(outputFileName);
                            }

                            Process.Start(outputFileName);
                        }

                    }),
                    // find everything that matches [WORD][whitespaces]Kit pattern
                    new Regex("\\w+\\s+Kit"));                        
                }
            }
        }
    }
      
    /// <summary>
    /// Highlights the search result.
    /// </summary>
    /// <param name="bitmap"> The bitmap. </param>
    /// <param name="searchResultItem"> The search result item. </param>
    /// <param name="page"> The page. </param>
    private static void HighlightSearchResult(Image bitmap, 
        SearchResultItem searchResultItem,
        Page page)
    {
        using (Graphics gr = Graphics.FromImage(bitmap))
        {
            double[] rectangle;
            SearchResultRegion region = page.TransformRegion(searchResultItem.Region,
                bitmap.Width, bitmap.Height, renderingSettings);

            foreach (double[] item in region.Blocks)
            {
                rectangle = item;
                PointF[] points = new PointF[rectangle.Length / 2];
                for (int i = 0; i < 4; i++)
                {
                    points[i] = new PointF((float)rectangle[i * 2],
                        (float)rectangle[(i * 2) + 1]);
                }

                gr.FillPolygon(hightlightBrush, points);
            }
        }
    }
}   


The complete code sample can be found in our github repo. Results of the execution are shown below; please note that in evaluation mode search API searches for text on first three pages only.


Pic. 1 Search text in PDF document - highlighted text

Pic. 1 Search text in PDF document - highlighted text


Summary


Apitron PDF Rasterizer for .NET is a complex solution that you can use for PDF rendering and also for implementing text search in PDF documents. It’s a cross-platform library available for many .NET based platforms (Xamarin, Mono, .NET just to name a few) and can be used to create mobile, desktop and web applications. Contact us if you have any questions regarding our products or services.

2016-02-13

PDFA validation - overcoming limitations of validation tools

Introduction


PDF/A is a perfect alternative when it comes to archiving and saving documents for later use. The format guarantees that the document can be read years after creation because all resources needed to process the document are embedded into the file. Sometimes PDFA is set as a requirement for saving documents with digital signatures, e.g. contracts, official papers and so on.

There are plenty of tools on the market that claim that they can produce PDF/A documents, and the only way to check if the tool fulfills this condition is to check it using a PDFA validation tool.

The most popular and reliable tool from our point of view is Adobe Acrobat Professional – a paid professional version of the well-known Adobe Reader. It allows you to validate the document against many conditions including PDF/A compatibility using built-in Preflight tool. As Adobe is the author of PDF standard it know all inside outs of the PDF/A as well.

There are other PDFA validation tools produced by various software companies, but sometimes their results differ from Adobe Acrobat Professional due to double interpretation of the PDF-A specification.

We use Adobe as a gold standard and Apitron PDF Kit for .NET product produces files 100% verifiable by Adobe Acrobat Professional. If you use the same toolchain you don’t have to worry, as this post describes possible warnings produced by other tools, and custom settings needed to avoid them.  

One of the possible warnings issued is – “the file contains cross reference streams”, it’s related to internal storage format of objects to ids mapping in PDF document. PDF versions prior to 1.5 (released in 2003) used cross reference tables instead of cross reference stream objects. The advantages of using streams over table are:

    • A more compact representation of cross-reference information

    • The ability to access compressed objects that are stored in object streams (see 7.5.7, "Object
    Streams" section of the specification) and to allow new cross-reference entry types to be added in
    the future

Current PDF version is 1.7 (updated 2011), so it’s a pretty old feature and PDFA (released in 2005) don’t forbid the use of such objects. To fix the cross-reference stream warning for those who need this we introduced the new setting for the PDF export API. The code sample can be found in the next section.


The code


class Program
{
    static void Main(string[] args)
    {
        using (Stream stream = File.Open(@"../../data/document.pdf",
            FileMode.Open, FileAccess.Read))
        {
            // create document object and specify the output format
            FixedDocument doc = new FixedDocument(stream, PdfStandard.PDFA);

            // save document
            using (Stream outputStream = File.Create(@"pdfa_document.pdf"))
            {
                // turn off cross reference stream usage
                doc.IsCompressedStructure = false;
                doc.Save(outputStream);
            }
        }
 
 Process.Start("pdfa_document.pdf");
    }
}

You see that by setting the IsCompressesStructure property it’s not possible to control cross reference streams usage. The complete code sample can be found in our github repo.

The image below demonstrates PDFA document validation using Adobe Acrobat:

Pic. 1 PDFA validation

Pic. 1 PDFA validation

Summary


The Apitron PDF Kit for .NET is a powerful library for creation and manipulation of PDF and PDF/A documents. This product has many unique features, offers easy to use API and is cross-platform that means you can create apps for .net (windows, windows phone, windows store), ios & android (via xamarin) and mono targeting modern mobile, desktop and web platforms at once. Contact us and we’ll be happy to answer your questions.

2016-02-06

How to add layers to PDF page using optional content

Introduction


While working on exporting the PDF document, sometimes you need many versions of the same content to be on one page along with an option to show only one version at once.

A multilanguage report or manual are perfect examples of such documents. Instead of producing a separate file for each language you could create a single file which would contain all the necessary information. A user would be able to switch content versions with a single click by selecting the appropriate layer.

Another example of layered content structure is an engineering drawing or complex schema composed of different logically separated parts which could be made visible or invisible on demand.
All these things are made possible using PDF feature called optional content - see the section “8.11 Optional Content” of the PDF specification for the details. The Apitron PDF Kit .NET component provides an API for layers manipulation and creation. Using this product you can easily create layered content in your PDF documents.

In general, the creation of the multiple layers on PDF page looks as follows:

1. Create several OptionalContentGroup objects and register them as document resources – these objects represent layer identifiers in PDF.

2. Create the OptionalContentConfiguration object, set its properties controlling the behavior and visual layer structure shown in reader’s UI. This object combines layers together and you can use it to define initially visible layers, locked layers, layers that should work as radio buttons etc. You can also define the visual tree structure – parent layer nodes and child nodes.

3. Create and initialize the OptionalContentProperties object required by the FixedDocument object – this object is used to define the default configuration to be used by the PDF reader to show layers, and to specify the list of layers (OptionalContentGroups resource ids) actually referenced in document’s content (cause not all registered layer ids may be in use).

4. Use ClippedContent objects to define the layers and assign their OptionalContentID property to the one of the registered layer ids (Optional Content Group resource IDs). Put these objects on PDF page using Page.Content.AppendContent(…) method.

The code demonstrating these steps can be found in the next section.

The code


class Program
{
    static void Main(string[] args)
    {
        using (Stream stream = File.Create("manual.pdf"))
        {
            // create our PDF document
            using (FixedDocument doc = new FixedDocument())
            {
                // turn on the layers panel when opened
                doc.PageMode = PageMode.UseOC;

                // register image resource
                doc.ResourceManager.RegisterResource(
                    new Apitron.PDF.Kit.FixedLayout.Resources.XObjects.Image(
                    "chair","../../data/chair.jpg"));

                // FIRST STEP: create layer definitions,
                // they should be registered as document resources
                OptionalContentGroup group0 = new OptionalContentGroup("group0",
                    "Page layers"IntentName.View);                   
                doc.ResourceManager.RegisterResource(group0);

                OptionalContentGroup group1 = new OptionalContentGroup("group1"
                    "Chair image"IntentName.View);
                doc.ResourceManager.RegisterResource(group1);                   

                OptionalContentGroup group2 = new OptionalContentGroup("English", "English",
                    IntentName.View);
                doc.ResourceManager.RegisterResource(group2);

                OptionalContentGroup group3 = new OptionalContentGroup("Dansk", "Dansk",
                    IntentName.View);
                doc.ResourceManager.RegisterResource(group3);

                OptionalContentGroup group4 = new OptionalContentGroup("Deutch", "Deutch",
                    IntentName.View);
                doc.ResourceManager.RegisterResource(group4);

                OptionalContentGroup group5 = new OptionalContentGroup("Русский", "Русский",
                    IntentName.View);
                doc.ResourceManager.RegisterResource(group5);

                OptionalContentGroup group6 = new OptionalContentGroup("Nederlands"
                    "Nederlands", IntentName.View);
                doc.ResourceManager.RegisterResource(group6);

                OptionalContentGroup group7 = new OptionalContentGroup("Français"
                    "Français", IntentName.View);
                doc.ResourceManager.RegisterResource(group7);

                OptionalContentGroup group8 = new OptionalContentGroup("Italiano"
                    "Italiano", IntentName.View);
                doc.ResourceManager.RegisterResource(group8);

                // SECOND STEP:
                // create the configuration, 
                // it allows to combine the layers together in any order     
                   
                // Default configuration:            
                OptionalContentConfiguration config = new OptionalContentConfiguration(
                    "configuration");
                
                // add groups to lists which define the rules controlling 
                // their visibility                  
                // ON groups
                config.OnGroups.Add(group0);
                config.OnGroups.Add(group1);
                config.OnGroups.Add(group2);
                        
                // OFF groups              
                config.OffGroups.Add(group3);
                config.OffGroups.Add(group4);
                config.OffGroups.Add(group5);
                config.OffGroups.Add(group6);
                config.OffGroups.Add(group7);
                config.OffGroups.Add(group8);

                // lock the image layer
                config.LockedGroups.Add(group1);

                // make other layers working as radio buttons
                // only one translation will be visible at time
                config.RadioButtonGroups.Add(new[] { group2, group3, group4, group5, 
                    group6, group7, group8 });                    

                // show only groups referenced by visible pages
                config.ListMode = ListMode.VisiblePages;
                // initialize the states for all content groups
                // for the default configuration it should be on
                config.BaseState = OptionalContentGroupState.On;
                // set the name of the presentation tree
                config.Order.Name = "Default config";      
                // create a root node + sub elements            
                config.Order.Entries.Add(group0);
                config.Order.Entries.Add(new OptionalContentGroupTree(group1, group2, 
                    group3, group4, group5, group6, group7, group8));

                // FINAL step:
                // assign the configuration properties to document
                // all configurations and groups should be specified
                doc.OCProperties = new OptionalContentProperties(config, new
                    OptionalContentConfiguration[] {}, new[] { group0, group1, group2, 
                    group3, group4, group5, group6, group7, group8 });

                // create page and assing top layer id to its content
                // it will allow you to completely hide page's
                // content using the configuration we have created                   
                Page page = new Page();
                page.Content.OptionalContentID = "group0";

                // create image layer
                ClippedContent imageBlock = new ClippedContent(0, 0, 245, 300);
                // set the layer id
                imageBlock.OptionalContentID = "group1";
                imageBlock.AppendImage("chair", 0, 0, 245, 300);

                // put the layer on page
                page.Content.SaveGraphicsState();
                page.Content.Translate(0, 530);
                page.Content.AppendContent(imageBlock);
                page.Content.RestoreGraphicsState();

                // append text layers
                AppendTextLayers(page);
                // add the page to the document and save it
                doc.Pages.Add(page);

                doc.Save(stream);
            }
        }

        Process.Start("manual.pdf");
    }

    static void AppendTextLayers(Page page)
    {
        page.Content.SaveGraphicsState();
        page.Content.Translate(250, 325);

        // evaluate each property of a resource dictionary and add text to the PDF page
        foreach (PropertyInfo info in typeof(strings).GetRuntimeProperties())
        {
            if (info.PropertyType == typeof(string))
            {
                ClippedContent textContent = new ClippedContent(0, 0, 300, 500);
                // assign layer id
                textContent.OptionalContentID = info.Name;
                textContent.Translate(0, 0);

                // preprocess parsed elements and set additional properties
                // for better visual appearance
                IEnumerable<ContentElement> elements =
                    ContentElement.FromMarkup((string)info.GetValue(null));
                   
                foreach (Br lineBreak in elements.OfType<Br>())
                {
                    lineBreak.Height = 10;  
                }

                foreach (Section subSection in elements.OfType<Section>())
                {
                    subSection.Font = 
                    new Apitron.PDF.Kit.Styles.Text.Font("HelveticaBold", 14);
                }

                // draw text
                textContent.AppendContentElement(new Section(elements), 300, 500);
                // put the text layer on page
                page.Content.AppendContent(textContent);                
            }
        }

        page.Content.RestoreGraphicsState();
    }
}

You can see that we used content elements from FlowLayout API to prepare translated text blocks, more information about the Fixed and Flow layout API can be found by this link.

We exactly followed the algorithm described in the Introduction section:

1.  Created layer identifier resources and registered them

2. Created default layers configuration in a form of tree view and configured layers to work as radio buttons, except the image layer which we marked as locked to demonstrate this feature

3. Let the document know about the created configuration and layers used

4. Marked all layers with corresponding registered layer ids

5.  Saved the PDF document

The complete code sample can be downloaded from our github repo (link).

Resulting PDF document looks as follows:

Pic. 1 Multilanguage PDF document with layers

Pic. 1 Multilanguage PDF document with layers

You see the locked layer containing chair image and language layers available for viewing. These language layers work as radio buttons group, when one is turned on others go off.

Summary


The Apitron PDF Kit for .NET is a powerful tool for creation and manipulation of PDF and PDF/A documents. It’s cross-platform and can be used to create .NET, Mono and Xamarin applications for Windows, iOS, Android and other operation systems. You can read more about the library on the product page. Contact us if you have any questions and we’ll be glad to assist you.