• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

riyaz.net

Tech Tips and Tutorials for SAP Professionals and Bloggers

  • Home
  • WordPress
  • Tips & Tricks
  • Internet
  • SAP
    • SAP PI
    • SAP ABAP
  • Personal Finance
  • Health & Fitness
  • Travel & Leisure
You are here: Home / SAP / Parsing PDF files using SAP Conversion Agent – Part II

Parsing PDF files using SAP Conversion Agent – Part II

November 10, 2009 by Guest Authors

Parsing PDF in SAP
  • Part 1: Development in PI Design (IR)
  • Part 2: Development in Conversion Agent Studio
  • Part 3: Deploying Conversion Agent Project
  • Part 4: Development in PI Configuration (ID)
  • Part 5: Test the Scenario

Development in Conversion Agent Studio

Open the CA Studio and create a new Project with type parser as shown

Parsing PDF Files using SAP PI

Select Project type as Parser

Parsing PDF Files using SAP PI

Enter name of the Project

Parsing PDF Files using SAP PI

Enter name of the Parser

Parsing PDF Files using SAP PI

Enter name of the Parser Script

Parsing PDF Files using SAP PI

Select the Schema

Parsing PDF Files using SAP PI

Select File option
 
Parsing PDF Files using SAP PI

Select the PDF file

Parsing PDF Files using SAP PI

Select PDF option

Parsing PDF Files using SAP PI

Select PDF to Unicode (UTF-8)

Parsing PDF Files using SAP PI

Select Custom Format

Parsing PDF Files using SAP PI

Parsing PDF Files using SAP PI

Say finish.

Open the Project node in Conversion Agent Explorer and double click on BillParser_Script.tgp file and start development as shown below:

SAP Conversion Agent Explorer

Now go through the Method-1 and Method-2, and select either of the methods, but Method-2 is always suggestible.

Method-1: Select the text and Drag directly to element in Schema

Select the text from Source View and drag to element in Schema.

Here Mr. SHATAVAHANA CHANDA is a text selected from source pdf file and drag to element ‘Name’ in the Schema view.

Similarly simply drag the remaining Address content.

Select text “FUJITSU CONSULTING INDIA PVT LTD,” and drag to element ‘Company’ in schema

Select text “# 703-704, ADITYA TRADE CENTER,” and drag to element ‘Door’ in schema

Select text “AMEERPET.BEHIND INDU PUBLICE SCHOOL” and drag to element ‘Street’ in schema

Select text “HYDERABAD 500016” and drag to element ‘City’ in schema

Parsing PDF Files using SAP PI
Parsing PDF Files using SAP PI

As mentioned above at Mark1: If the parameter default_transformers = RemoveMarginSpace () then it removes the unwanted margin spaces. This option makes easy to parse the file.

At Mark2: Content is used to parse the content on file and having options like

A. (opening_marker,closing _marker) which accepts numeric values this is used to parser the content with in the limit, let’s say if (opening_marker=0,closing_marker=10)then it will parse the content on file only within the limit. We can see more about this in next scenario.

B. In the current scenario we have chosen value = LearnByExample(“ <Text selected from File>“)

C. data_holder is mandatory here, why because you should mention the schema path to identify which element the selected content has to pass.

Most preferable method is (Method-2(A, C)) than (Method-1(B, C)), because if you follow (Method-1(B, C)) it will be fine if customer name is Mr. SHATAVAHANA CHANDA but in real time cases Vodafone can have multiple customers and if they want to process all the bills then it varies and gives error saying can’t parse the file because customer name and his address will be different. So better to follow the Method-2 as shown below.

Method-2: Parsing using Insert Offset Content

So to avoid that simply follow (step2 and step3) as shown

After setting the default_transformers = RemoveMarginSpace() then go to Source view at right side and select the text and right click and say Insert Offset Content

Parsing PDF Files using SAP PI

Here customer Name Mr. SHATAVAHANA CHANDA is opened at mark 245 and Closes at mark 22 and then mention the data_holder parameter.

Now select the Company “FUJITSU CONSULTING INDIA PVT LTD” right click and say Insert Offset Content

Parsing PDF Files using SAP PI

Similarly follow the same steps for remaining content.

Parsing PDF Files using SAP PI
If you observe the above screen shot I haven’t use the option called ‘value’. I used only (opening_marker,closing_marker and data_holder)parameters.

Save and validate the project.

Parsing PDF Files using SAP PI

Now run the Parser as shown below.

Parsing PDF Files using SAP PI

Now check the output.xml

Parsing PDF Files using SAP PI

This shows the result in xml format when you parse the PDF file.

Parsing PDF Files using SAP PI
The next part talks about Deploying Conversion Agent project.

Filed Under: SAP Tagged With: SAP PI

About Guest Authors

Primary Sidebar

Popular Guides

  • Dropbox Tutorial
  • CDN Setup Guide
  • Blog Design Tips
  • Optimize RSS Feed
  • Create Twitter App
  • Short URLs
  • Password Protect Folders
  • Time Management
  • ALE IDocs Tutorial
  • SAP PI Starter Kit
  • SAP PI Tips
  • Ergonomics

Popular Topics

  • Home
  • WordPress
  • Tips & Tricks
  • Internet
  • SAP
    • SAP PI
    • SAP ABAP
  • Personal Finance
  • Health & Fitness
  • Travel & Leisure

About Riyaz

riyaz.net is a popular technology site with how-to guides, tips and tutorials on personal technology, blogging, social media, web apps, personal finance and SAP.

riyaz.net was launched way back in 2005 by web designer, blogger and SAP Consultant Riyaz Sayyad from Pune, India. Over the years the site has grown into a full featured online community with thousands of visitors daily from around the world. Read more.

Copyright © 2025 · riyaz.net