Search text in PDF documents using regular expressions


Searching text in PDF document is easy and this feature became available to users of our Apitron PDF Rasterizer for .NET component many releases ago. Now we’ve updated the API and you can search for text on PDF page using standard .NET regular expression objects (Regex).

Text search API offered by Apitron PDF Rasterizer is decoupled from the rendering part and can be used independently. It’s represented by the SearchIndex class that handles all search tasks and offers very useful features like building search indices for the documents, and saving/loading of such indices for the later use.

Using search API offered by Apitron PDF Rasterizer you can also highlight text on rendered pages because you get all necessary information about text position on PDF page.

See the code section for details.

The code

class Program
    // global rendering settings
    static RenderingSettings renderingSettings = new RenderingSettings();
    // hightlight brush for search results
    static Brush hightlightBrush = new SolidBrush(Color.FromArgb(100,255,255,0));

    static void Main(string[] args)
        // the source file to search the text into
        string inputFilePath = "../../data/Apitron_Pdf_Kit_in_Action.pdf";           

        // open pdf document for search and rendering
        // we'll use 2 different streams here
        using (Stream searchStream = new FileStream(inputFilePath, FileMode.Open,
            documentStream = new FileStream(inputFilePath, FileMode.Open,
            // create search object from PDF data stream
            using (SearchIndex searchIndex = new SearchIndex(searchStream))
                // open document to be used for rendering
                using (Document doc = new Document(documentStream))
                    searchIndex.Search((handlerArgs =>
                        // if we have results
                        if (handlerArgs.ResultItems.Count != 0)
                            // create resulting image filename
                            string outputFileName = string.Format("{0}_{1}.png",

                            // render found result and start system image viewer
                            Page page = doc.Pages[handlerArgs.PageIndex];
                            using (Image bitmap = page.Render(new Resolution(96, 96),
                                foreach (SearchResultItem searchResultItem in
                                    HighlightSearchResult(bitmap, searchResultItem,



                    // find everything that matches [WORD][whitespaces]Kit pattern
                    new Regex("\\w+\\s+Kit"));                        
    /// <summary>
    /// Highlights the search result.
    /// </summary>
    /// <param name="bitmap"> The bitmap. </param>
    /// <param name="searchResultItem"> The search result item. </param>
    /// <param name="page"> The page. </param>
    private static void HighlightSearchResult(Image bitmap, 
        SearchResultItem searchResultItem,
        Page page)
        using (Graphics gr = Graphics.FromImage(bitmap))
            double[] rectangle;
            SearchResultRegion region = page.TransformRegion(searchResultItem.Region,
                bitmap.Width, bitmap.Height, renderingSettings);

            foreach (double[] item in region.Blocks)
                rectangle = item;
                PointF[] points = new PointF[rectangle.Length / 2];
                for (int i = 0; i < 4; i++)
                    points[i] = new PointF((float)rectangle[i * 2],
                        (float)rectangle[(i * 2) + 1]);

                gr.FillPolygon(hightlightBrush, points);

The complete code sample can be found in our github repo. Results of the execution are shown below; please note that in evaluation mode search API searches for text on first three pages only.

Pic. 1 Search text in PDF document - highlighted text

Pic. 1 Search text in PDF document - highlighted text


Apitron PDF Rasterizer for .NET is a complex solution that you can use for PDF rendering and also for implementing text search in PDF documents. It’s a cross-platform library available for many .NET based platforms (Xamarin, Mono, .NET just to name a few) and can be used to create mobile, desktop and web applications. Contact us if you have any questions regarding our products or services.

