PDF tips & tricks: February 2017

Introduction

Replacing, editing or deleting content from PDF documents programmatically is not a trivial task and requires expert knowledge of the format and internal structures to be implemented from scratch. Luckily, we made it much easier for you by introducing native support for these operations. You can examine document’s content page by page and change the things you need without any significant efforts. In this article we’ll demonstrate how to implement text and image replacement or editing, removing contents from the desired area or region, resources replacement, graphics paths alteration, and getting content elements’ boundaries.

Replacing text and images

Let’s assume you’re developing a web-based solution for a real estate agency and you need to process advertisements stored as PDF documents. One of them could look as below:

Pic. 1 Sample advertisement stored as PDF

But the complete listing should only be accessible to the logged in customers, while you still want the ad to be viewable by other users but with some restrictions that include price and the photo of the object. One of solutions is to generate it dynamically. Here is the code:

static void Main(string[] args)

{

ReplaceTextAndImages("../../../data/advertisement.pdf", "$","Price: contact us",

"../../../data/replacement.png");

}

private static void ReplaceTextAndImages(string inputFilePath, string oldText,

string newText, string replacementImagePath)

{

using (Stream inputStream = File.Open(inputFilePath, FileMode.Open, FileAccess.Read))

{

using (FixedDocument doc = new FixedDocument(inputStream))

{

// add the replacement image to document's resources

doc.ResourceManager.RegisterResource(new Image("replacement_image",

replacementImagePath, true));

// enumerate content elements found on document's first page

foreach (IContentElement element in doc.Pages[0].Elements)

{

// handle the text element case

if (element.ElementType == ElementType.Text)

{

TextContentElement textElement = element as TextContentElement;

if (textElement != null)

{

// go thought all the text segments and replace

// the segment that contains the sample text

foreach (TextSegment textSegment in textElement.Segments)

{

if (textSegment.Text.Contains(oldText))

{

TextObject newTextObject =

new TextObject(textSegment.FontName,textSegment.FontSize);

newTextObject.AppendText(newText);

textSegment.ReplaceText(0, textSegment.Text.Length, newTextObject);

}

} // handle image case

else if (element.ElementType == ElementType.Image)

{

ImageContentElement imageElement = element as ImageContentElement;

if (imageElement != null)

{

// just replace the image with new one using

// registered resource, removing old one

imageElement.Replace("replacement_image", true);

}

// save modified file

using (Stream outputStream = File.Create(outputFileName))

{

doc.Save(outputStream);

}

Process.Start(outputFileName);

}

And the resulting file produced by this code is shown below:

Pic. 2 Edited PDF document

Content deletion

Let’s say you have a document shown below and would like to remove all content that intersects with an arbitrary rectangular region.

Pic. 3 Sample document for content removal

Here is the code that does the job, it also highlights the elements that were removed using their calculated boundaries:

static void Main(string[] args)

{

RemoveContentInRect("../../../data/apitron_pdf_kit_in_action_excerpt.pdf",

new Boundary(70, 200, 330, 450));

}

private static void RemoveContentInRect(string inputFilePath, Boundary redactionRect)

{

using (Stream inputStream = File.Open(inputFilePath, FileMode.Open, FileAccess.Read))

{

using (FixedDocument doc = new FixedDocument(inputStream))

{

doc.ResourceManager.RegisterResource(

new GraphicsState("myGraphicsState") {CurrentNonStrokingAlpha = 0.3});

// enumerate content elements found on document's first page

Page firstPage = doc.Pages[0];

firstPage.Content.SaveGraphicsState();

firstPage.Content.SetDeviceStrokingColor(new []{1.0,0,0});

foreach (IContentElement element in firstPage.Elements)

{

// remove elements falling into the deletion region

// even if they just overlap

if (element.ElementType == ElementType.Text)

{

TextContentElement textElement = (TextContentElement) element;

foreach (TextSegment segment in textElement.Segments)

{

if (RectsOverlap(redactionRect, segment.Boundary))

{

firstPage.Content.StrokePath(Path.CreateRect(segment.Boundary));

segment.Remove();

}

else if (!RectsOverlap(redactionRect, element.Boundary))

{

firstPage.Content.StrokePath(Path.CreateRect(element.Boundary));

element.Remove();

}

// highlight deletetion region

firstPage.Content.SetGraphicsState("myGraphicsState");

firstPage.Content.SetDeviceStrokingColor(new []{0.0});

firstPage.Content.SetDeviceNonStrokingColor(new []{0.0});

firstPage.Content.FillAndStrokePath(Path.CreateRect(redactionRect));

firstPage.Content.RestoreGraphicsState();

// save modified file

using (Stream outputStream = File.Create(outputFileName))

{

doc.Save(outputStream);

}

public static bool RectsOverlap(Boundary a, Boundary b)

{

return (a.Left < b.Right && a.Right> b.Left && a.Bottom<b.Top && a.Top>b.Bottom);

}

Resulting document is demonstrated below:

Pic. 4 Document with partially removed content

Changing existing drawings or graphics paths

If you have a drawing you would like to alter there is an API for that as well. You can also prepend or append PDF content to it, scale, translate or delete. Here is our sample file:

Pic. 5 PDF document with vector drawing

And our code that changes it a bit by altering non stroking colors for all found paths:

static void Main(string[] args)
{
ReplacePaths("../../../data/graphics.pdf");
}

private static void ReplacePaths(string inputFilePath)
{
using (Stream inputStream = File.Open(inputFilePath, FileMode.Open, FileAccess.Read))
{
using (FixedDocument doc = new FixedDocument(inputStream))
{
double colorComponent = 0;
double colorDelta = 0.1;

// enumerate content elements found on document's first page
foreach (IContentElement element in doc.Pages[0].Elements)
{
// change the fill color of each found drawing
if (element.ElementType == ElementType.Drawing)
{
DrawingContentElement drawingElement = (DrawingContentElement) element;
drawingElement.SetNonStrokingColor(
new double[] { Math.Min(colorComponent,1),0, 0});
colorComponent += colorDelta;
}
}

// save modified file
using (Stream outputStream = File.Create(outputFileName))
{
doc.Save(outputStream);
}
}
}

Process.Start(outputFileName);
}

You can set stroking or non-stroking colors, examine drawing rule or operation type used, even examine the path or add some content by using AddContent method if you need.

The resulting document produced by the code is shown below:

Pic. 6 Edited graphics paths

Replacing resources in PDF documents

You probably know that PDF documents can contains various resources like fonts, tiling patterns, images, FormXObjects, colorprofiles etc. Whenever you need to replace a resource you can use a special API created for that.

Every FixedDocument (our name for PDF document) has its own resource manager accessible by the property of the same name. So in order to change the resource you can use the following code (relevant part is highlighted):

static void Main(string[] args)

{

using (Stream inputStream = File.Open("../../../data/patternFill.pdf",

FileMode.Open, FileAccess.Read))

{

using (FixedDocument doc = new FixedDocument(inputStream))

{

// create a new tiling pattern

TilingPattern pattern = new TilingPattern("myNewPattern",

new Boundary(0, 0, 20, 20), 25, 25);

pattern.Content.SetDeviceNonStrokingColor(new double[] { 0.1, 0.5, 0.7 });

pattern.Content.FillAndStrokePath(Path.CreateCircle(10, 10,9));

// register new pattern as a resource

doc.ResourceManager.RegisterResource(pattern);

// replace the old pattern with new one

doc.ResourceManager.RegisterReplacement("myPattern","myNewPattern");

//save modified file

using (Stream outputStream = File.Create(outputFileName))

{

doc.Save(outputStream);

}

Process.Start(outputFileName);

}

In this example we replaced the old tiling pattern resource with the new one. Using this technique you can change the appearance of the PDF documents just by changing resources used by drawing operations.

Summary

In this article we demonstrated a few possible scenarios for content editing, removal and replacement in PDF. The topic is quite extensive, so probably we didn’t cover your particular case or maybe you have a specific question. If you need any help with the API or a professional advice just drop us an email, and we’ll be happy to assist you. All samples used in this article can be found in our github repo as well.

PDF tips & tricks

pdf links

2017-02-28

How to programmatically delete, edit or replace content in PDF documents

Introduction

Replacing text and images

Content deletion

Changing existing drawings or graphics paths

Replacing resources in PDF documents

Summary