pdf links

PDF Rendering
Convert PDF to Image (.NET)
Convert PDF to image on Android (Xamarin)
Convert PDF to image on iOS (Xamarin)
Convert PDF to image in Windows Store apps (.NET)
Convert PDF to image in Windows Phone apps (.NET)
PDF to image in Universal Windows Store apps (.NET)
Free PDF Viewer control for Windows Forms (.NET)
How to integrate PDF Viewer control in WPF app (.NET)
Creating WPF PDF Viewer supporting bookmarks (.NET)
Cross-platform PDF Viewer using GTK# (MONO)
Silverlight PDF viewer control (Silverlight 5)
Multithreaded PDF rendering (.NET)
Convert pdf to image in Silverlight app (C# sample)
How to set fallback fonts for PDF rendering (C#)
Avoiding the out-of-memory exception on rendering (C#)
PDF viewer single page application (WebAPI, AngularJS)
PDF viewer control for Windows 10 universal applications
Use custom ICC profile for CMYK to RGB conversion
PDF layers - separate images, text, annotations, graphics

PDF Forms Creation PDF Security
Conversion to PDF/A
Other topics
PDF Document Manipulation
PDF Content Generation
Fixed and Flow layout document API (.NET)
Creation of grids and tables in PDF (C# sample)
How to create interactive documents using Actions (C# sample)
Text flow effects in PDF (C# sample)
How to generate ordered and bulleted lists in PDF (C# sample)
Convert HTML to PDF using flow layout API (C# sample)
How to use custom fonts for PDF generation (.NET)
Create document with differently sized pages (C#)
Create PDF documents using MONO (C#/MONO/Windows/OSX)
How to use background images for content elements (C#/PDF Kit/FlowLayout)
Add transparent images to PDF document (C#)
Draw round rect borders in PDF documents(C#)
ICC color profiles and and ICC based colors in PDF (C#)
How to use bidirectional and right to left text in PDF (C#)
Create PDF documents from XML templates (C# sample)
How to resize PDF pages and use custom stamps (C#)
Add header and footer to PDF page (.NET sample)
How to use clipping mask for drawing on PDF page
Fill graphics path with gradient brushes in PDF (Shadings)
Apitron PDF Kit and Rasterizer engine settings
Add layers to PDF page (optional content, C# sample)
How to create free text annotation with custom appearance

PDF Content Extraction
PDF Navigation

PDF to TIFF conversion
Contact us if you have a PDF related question and we'll cover it in our blog.

2017-02-28

How to programmatically delete, edit or replace content in PDF documents

Introduction


Replacing, editing or deleting content from PDF documents programmatically is not a trivial task and requires expert knowledge of the format and internal structures to be implemented from scratch. Luckily, we made it much easier for you by introducing native support for these operations. You can examine document’s content page by page and change the things you need without any significant efforts. In this article we’ll demonstrate how to implement text and image replacement or editing, removing contents from the desired area or region, resources replacement, graphics paths alteration, and getting content elements’ boundaries.



Replacing text and images


Let’s assume you’re developing a web-based solution for a real estate agency and you need to process advertisements stored as PDF documents. One of them could look as below:


Pic. 1 Sample advertisement stored as PDF

Pic. 1 Sample advertisement stored as PDF

But the complete listing should only be accessible to the logged in customers, while you still want the ad to be viewable by other users but with some restrictions that include price and the photo of the object. One of solutions is to generate it dynamically. Here is the code: 

static void Main(string[] args)
{
       ReplaceTextAndImages("../../../data/advertisement.pdf", "$","Price: contact us",
              "../../../data/replacement.png");
}

private static void ReplaceTextAndImages(string inputFilePath, string oldText, 
      string newText, string replacementImagePath)
{
    using (Stream inputStream = File.Open(inputFilePath, FileMode.Open, FileAccess.Read))
    {
        using (FixedDocument doc = new FixedDocument(inputStream))
        {
            // add the replacement image to document's resources
            doc.ResourceManager.RegisterResource(new Image("replacement_image",
                  replacementImagePath, true));

            // enumerate content elements found on document's first page
            foreach (IContentElement element in doc.Pages[0].Elements)
            {
                // handle the text element case
                if (element.ElementType == ElementType.Text)
                {
                    TextContentElement textElement = element as TextContentElement;
                    if (textElement != null)
                    {
                        // go thought all the text segments and replace 
                        // the segment that contains the sample text
                        foreach (TextSegment textSegment in textElement.Segments)
                        {
                            if (textSegment.Text.Contains(oldText))
                            {
                                TextObject newTextObject = 
                                new TextObject(textSegment.FontName,textSegment.FontSize);
                                newTextObject.AppendText(newText);
                                textSegment.ReplaceText(0, textSegment.Text.Length, newTextObject);
                            }
                        }
                    }
                } // handle image case
                else if (element.ElementType == ElementType.Image)
                {
                    ImageContentElement imageElement = element as ImageContentElement;

                    if (imageElement != null)
                    {
                        // just replace the image with new one using
                        // registered resource, removing old one
                        imageElement.Replace("replacement_image", true);
                    }
                }
            }

            // save modified file
            using (Stream outputStream = File.Create(outputFileName))
            {
                doc.Save(outputStream);
            }
        }
    }

    Process.Start(outputFileName);
} 


And the resulting file produced by this code is shown below:


Pic. 2 Edited PDF document

Pic. 2 Edited PDF document




Content deletion


Let’s say you have a document shown below and would like to remove all content that intersects with an arbitrary rectangular region.


Pic. 3 Sample document for content removal

Pic. 3 Sample document for content removal

Here is the code that does the job, it also highlights the elements that were removed using their calculated boundaries:

static void Main(string[] args)
{
    RemoveContentInRect("../../../data/apitron_pdf_kit_in_action_excerpt.pdf",
          new Boundary(70, 200, 330, 450));
}

private static void RemoveContentInRect(string inputFilePath, Boundary redactionRect)
{
    using (Stream inputStream = File.Open(inputFilePath, FileMode.Open, FileAccess.Read))
    {
        using (FixedDocument doc = new FixedDocument(inputStream))
        {
            doc.ResourceManager.RegisterResource(
               new GraphicsState("myGraphicsState") {CurrentNonStrokingAlpha = 0.3});

            // enumerate content elements found on document's first page
            Page firstPage = doc.Pages[0];

            firstPage.Content.SaveGraphicsState();
            firstPage.Content.SetDeviceStrokingColor(new []{1.0,0,0});

            foreach (IContentElement element in firstPage.Elements)
            {
                // remove elements falling into the deletion region
                // even if they just overlap
                if (element.ElementType == ElementType.Text)
                {
                    TextContentElement textElement = (TextContentElement) element;

                    foreach (TextSegment segment in textElement.Segments)
                    {
                        if (RectsOverlap(redactionRect, segment.Boundary))
                        {
                            firstPage.Content.StrokePath(Path.CreateRect(segment.Boundary));
                            segment.Remove();
                        }
                    }
                }
                else if (!RectsOverlap(redactionRect, element.Boundary))
                {
                    firstPage.Content.StrokePath(Path.CreateRect(element.Boundary));
                    element.Remove();
                }
            }
                
            // highlight deletetion region
            firstPage.Content.SetGraphicsState("myGraphicsState");
            firstPage.Content.SetDeviceStrokingColor(new []{0.0});
            firstPage.Content.SetDeviceNonStrokingColor(new []{0.0});
            firstPage.Content.FillAndStrokePath(Path.CreateRect(redactionRect));
            firstPage.Content.RestoreGraphicsState();

            // save modified file
            using (Stream outputStream = File.Create(outputFileName))
            {
                doc.Save(outputStream);
            }
        }
    }
}

public static bool RectsOverlap(Boundary a, Boundary b)
{
    return (a.Left < b.Right && a.Right> b.Left && a.Bottom<b.Top && a.Top>b.Bottom);
}


Resulting document is demonstrated below:


Pic. 4 Document with partially removed content

Pic. 4 Document with partially removed content



Changing existing drawings or graphics paths


If you have a drawing you would like to alter there is an API for that as well. You can also prepend or append PDF content to it, scale, translate or delete. Here is our sample file:


Pic. 5 PDF document with vector drawing

Pic. 5 PDF document with vector drawing


And our code that changes it a bit by altering non stroking colors for all found paths:


static void Main(string[] args)
{
    ReplacePaths("../../../data/graphics.pdf");
}

private static void ReplacePaths(string inputFilePath)
{
    using (Stream inputStream = File.Open(inputFilePath, FileMode.Open, FileAccess.Read))
    {
        using (FixedDocument doc = new FixedDocument(inputStream))
        {
            double colorComponent = 0;
            double colorDelta = 0.1;

            // enumerate content elements found on document's first page
            foreach (IContentElement element in doc.Pages[0].Elements)
            {
                // change the fill color of each found drawing
                if (element.ElementType == ElementType.Drawing)
                {
                    DrawingContentElement drawingElement = (DrawingContentElement) element;
                    drawingElement.SetNonStrokingColor(
                          new double[] { Math.Min(colorComponent,1),0, 0});
                    colorComponent += colorDelta;
                }
            }

            // save modified file
            using (Stream outputStream = File.Create(outputFileName))
            {
                doc.Save(outputStream);
            }
        }
    }

    Process.Start(outputFileName);
}

You can set stroking or non-stroking colors, examine drawing rule or operation type used, even examine the path or add some content by using AddContent method if you need.

The resulting document produced by the code is shown below:

Pic. 6 Edited graphics paths

Pic. 6 Edited graphics paths


Replacing resources in PDF documents


You probably know that PDF documents can contains various resources like fonts, tiling patterns, images, FormXObjects, colorprofiles etc. Whenever you need to replace a resource you can use a special API created for that.

Every FixedDocument (our name for PDF document) has its own resource manager accessible by the property of the same name. So in order to change the resource you can use the following code (relevant part is highlighted):

static void Main(string[] args)
{
    using (Stream inputStream = File.Open("../../../data/patternFill.pdf",
         FileMode.Open, FileAccess.Read))
    {
        using (FixedDocument doc = new FixedDocument(inputStream))
        {
            // create a new tiling pattern
            TilingPattern pattern = new TilingPattern("myNewPattern", 
      new Boundary(0, 0, 20, 20), 25, 25);
            pattern.Content.SetDeviceNonStrokingColor(new double[] { 0.1, 0.5, 0.7 });
            pattern.Content.FillAndStrokePath(Path.CreateCircle(10, 10,9));

            // register new pattern as a resource
            doc.ResourceManager.RegisterResource(pattern);

            // replace the old pattern with new one
            doc.ResourceManager.RegisterReplacement("myPattern","myNewPattern");

            //save modified file
            using (Stream outputStream = File.Create(outputFileName))
            {
                doc.Save(outputStream);
            }
        }
    }

    Process.Start(outputFileName);
}

In this example we replaced the old tiling pattern resource with the new one. Using this technique you can change the appearance of the PDF documents just by changing resources used by drawing operations.


Summary


In this article we demonstrated a few possible scenarios for content editing, removal and replacement in PDF. The topic is quite extensive, so probably we didn’t cover your particular case or maybe you have a specific question. If you need any help with the API or a professional advice just drop us an email, and we’ll be happy to assist you. All samples used in this article can be found in our github repo as well.

No comments:

Post a Comment