Overview of B2B Data
Transformation
·
B2B Data Transformation enables you to
transform data efficiently from any format to any other format.
·
Data Transformation can process fully
structured, semi-structured, or unstructured data.
·
The software can be configured to work
with text, binary data, messaging formats, HTML pages, PDF documents,
word-processor documents, etc.
·
Data Transformation parser can be
configured to transform the data to any standard or custom XML vocabulary.
·
In the reverse direction, a Data
Transformation serializer can be configured to transform the XML data to any
other format.
·
Data Transformation mapper can be
configured to perform XML to XML transformations.
How B2B Data
Transformation Works
The Data Transformation system has two main components:
Component
|
Description
|
Data Transformation Studio
|
The design and configuration environment of Data Transformation.
|
Data Transformation Engine
|
The transformation engine.
|
Data Transformation Studio
·
The Studio is a visual
editor environment where you can design and configure transformations such as
parsers, serializers, and mappers.
·
Studio can be used to
configure Data Transformation to process data of a particular type.
Data Transformation Engine
·
Data Transformation Engine
is an efficient transformation processor.
·
It has no user interface. It
works entirely in the background, executing the transformations that have been
previously defined in Studio.
·
To move a transformation
from the Studio to the Engine, we must deploy the transformation as a
DataTransformation service.
Default Installation
Folder
By default, Data Transformation is installed in the following
location:
C:\Program
Files\Informatica\DataTransformation
The setup prompts you to change the location if desired.
Tutorials and
Workspace Folders
By default, the locationof tutorials is:
C:\Program
Files\Informatica\DataTransformation\tutorials and that of workspace is:
My
Documents\Informatica\DataTransformation\9.0\workspace
Transformation
Architecture
·
Transformation
Components
·
Data
Holders
·
Documents
·
Data
Transformation Services
Transformation
Components
Top Level Components
Component
|
Description
|
Parser
|
A
component that converts source documents in any format to XML.
|
Serializer
|
A
component that converts XML documents to output documents in any format.
|
Mapper
|
A
component that converts XML documents to a different XML structure or schema.
|
Transformer
|
A
component that modifies data. The input and output can be in any format.
|
Streamer
|
A
component that splits large inputs into segments that are processed
separately by the other components.
|
Nested Components
Component
|
Description
|
Formats
|
Define
the overall format of documents, such as the delimiters, that Data
Transformation should use to interpret the documents.
|
Document
processors
|
Operate
on a document as a whole, performing preliminary or final conversions.
|
Anchors
|
Define
the data in a source document that a parser should process and extract. The
anchors specify how a parser should search for the data and where it should
store the data that it finds.
|
Serialization
anchors
|
Define
how a serializer should write XML data to an output document.
|
Mapper
anchors
|
Define
how a mapper should write XML data to another XML structure or schema. The
anchors specify where to find the data in the source XML and where to write
the data in the output XML.
|
Transformers
|
In
addition to their use as top-level components, you can nest transformers
within a parser or a serializer.
|
Those who deal with data transfer or document
exchange within or across organizations with heterogeneous platforms will
certainly accept and appreciate the need and power of XML.
- What is XSD Schema?
- What are the advantages of XSD Schema?
- What is important in XSD Schema?
What
Is a Schema?
A schema is a "Structure", and the
actual document or data that is represented through the schema is called
"Document Instance". Those who are familiar with relational databases
can map a schema to a Table Structure and a Document Instance to a record in a
Table. And those who are familiar with object-oriented technology can map a
schema to a Class Definition and map a Document Instance to an Object Instance.
A structure of an XML document can be defined
as follows:
- Document Type Definition (DTDs)
- XML Schema Definition (XSD)
- XML Data Reduced (XDR) -proprietary to Microsoft
Technology
What
Is XSD?
XSD provides the syntax and defines a way in
which elements and attributes can be represented in a XML document. It also
advocates that the given XML document should be of a specific format and
specific data type.
Advantages
of XSD
So what is the benefit of this XSD Schema?
- XSD Schema is an XML document so there is no real need
to learn any new syntax, unlike DTDs.
- XSD Schema supports Inheritance, where one schema can
inherit from another schema. This is a great feature because it provides
the opportunity for re-usability.
- XSD schema provides the ability to define own data type
from the existing data type.
- XSD schema provides the ability to specify data types
for both elements and attributes.
Overview
First, look at what an XML
schema is. A schema formally
describes what a given XML document contains, in the same way a database schema
describes the data that can be contained in a database (table structure, data
types). An XML schema describes the coarse shape of the XML document, what
fields an element can contain, which sub elements it can contain, and so forth.
It also can describe the values that can be placed into any element or
attribute.
Elements
Elements are the main building block of any
XML document; they contain the data and determine the structure of the
document. An element can be defined within an XML Schema (XSD) as follows:
<xs:element name="x" type="y"/>
An
element definition within the XSD must have a name property; this is the name
that will appear in the XML document. The type property provides the
description of what can be contained within the element when it appears in the
XML document. There are a number of predefined types, such as xs:string,
xs:integer, xs:boolean or xs:date . You also can create a user-defined type by
using the <xs:simple type> and <xs:complexType> tags, but more on
these later.
If
you have set the type property for an element in the XSD, the corresponding
value in the XML document must be in the correct format for its given type.
(Failure to do this will cause a validation error.)
Examples
of simple elements and their XML are below:
<xs:element name = “customer_dob”
type =”xs:date”/>
<xs:element name = “customer_address”
type =”xs:string”/>
The
value the element takes in the XML document can further be affected by using
the fixed and default properties.
Default
means that, if no value is specified in the XML document, the application
reading the document (typically an XML parser or XML Data binding Library)
should use the default specified in the XSD.
Fixed
means the value in the XML document can only have the value specified in the
XSD.
For this reason, it does not make sense to use both default and fixed in the same element definition. (In fact, it's illegal to do so.)
For this reason, it does not make sense to use both default and fixed in the same element definition. (In fact, it's illegal to do so.)
<xs:element name="Customer_name"
type="xs:string" default="unknown"/>
<xs:element name="Customer_location"
type="xs:string" fixed=" Bangalore"/>
Cardinality
Specifying
how many times an element can appear is referred to as cardinality, and is specified by using the minOccurs and
maxOccurs attributes. In this way, an element can be mandatory, optional, or
appear many times. MinOccurs can be assigned any non-negative integer value
(for example: 0, 1, 2, 3... and so forth), and maxOccurs can be assigned any
non-negative integer value or the string constant "unbounded", meaning no maximum.
The default values for
minOccurs and maxOccurs is 1. So, if both the minOccurs and maxOccurs
attributes are absent, the element must appear once and once only.
<xs:element name="Customer_order"
type="xs:integer"
minOccurs ="0"
maxOccurs="unbounded"/>
Compositors
There
are three types of compositors <xs:sequence>, <xs:choice>, and
<xs:all>. These compositors allow you to determine how the child elements
within them appear within the XML document.
Compositor
|
Description
|
Sequence
|
The child elements in the XML document MUST
appear in the order they are declared in the XSD schema.
|
Choice
|
Only one of the child elements described in
the XSD schema can appear in the XML document.
|
All
|
The child elements described in the XSD schema
can appear in the XML document in any order.
|
Notes
The <xs:sequence> and
<xs:choice> compositors can be nested inside other compositors, and be
given their own minOccurs and maxOccurs properties. This allows for quite
complex combinations to be formed.
Example:
<xs:element name="Customer">
<xs:complexType>
<xs:sequence>
<xs:element name="Dob" type="xs:date" />
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="Line1" type="xs:string" />
<xs:element name="Line2" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Supplier">
<xs:complexType>
<xs:sequence>
<xs:element name="Phone" type="xs:integer" />
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="Line1" type="xs:string" />
<xs:element name="Line2" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
The above code will appear
as the diagram shown below:
Re-Use
It would make much more
sense to have one definition of "Address" that could be used by both
customer and supplier. You can do this by defining a complexType independently
of an element:
<xs:complexType name="AddressType">
<xs:sequence>
<xs:element name="Line1" type="xs:string"/>
<xs:element name="Line2" type="xs:string"/>
</xs:sequence>
</xs:complexType>
Thus this becomes as:
<xs:element name="Customer">
<xs:complexType>
<xs:sequence>
<xs:element name="Dob" type="xs:date"/>
<xs:element name="Address" type="AddressType"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="supplier">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType"/>
<xs:element name="phone" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The advantage should be
obvious. Instead of having to define Address twice (once for Customer and once
for Supplier), you have a single definition. This makes maintenance simpler ie
if you decide to add "Line3" or "Postcode" elements to your
address; you only have to add them in one place.
Attributes
An attribute provides extra
information within an element. Attributes are defined within an XSD as follows,
having name and type properties.
<xs:attribute name="x" type="y"/>
An Attribute can appear 0 or
1 times within a given element in the XML document. Attributes are either
optional or mandatory (by default, they are optional). The " use"
property in the XSD definition specifies whether the attribute is optional or
mandatory.
So, the following are
equivalent:
<xs:attribute name="ID" type="xs:string"/>
<xs:attribute name="ID" type="xs:string" use="optional"/>
Graphically:
To specify that an attribute
must be present, use = "required"
Some
of the problems with using attributes are:
·
Attributes cannot contain multiple values
(child elements can).
·
Attributes are not easily expandable (to
incorporate future changes to the schema).
·
Attributes cannot describe structures (child
elements can).
Namespaces:
Namespaces
are a mechanism for breaking up your schemas. Until now, you have assumed that
you only have a single schema file containing all your element definitions, but
the XSD standard allows you to structure your XSD schemas by breaking them into
multiple files. These child schemas can then be included into a parent schema.
Breaking
schemas into multiple files can have several advantages. You can create
re-usable definitions that can used across several projects. They make
definitions easier to read and version as they break down the schema into
smaller units that are simpler to manage.
Element
and Attribute Groups
Elements and Attributes can
be grouped together using <xs:group> and <xs:attributeGroup>. These
groups can then be referred to elsewhere within the schema. Groups must have a
unique name and be defined as children of the <xs:schema> element. When a
group is referred to, it is as if its contents have been copied into the
location it is referenced from.
Note: <xs:group> and
<xs:attributeGroup> cannot be extended or restricted in the way
<xs:complexType> or <xs:simpleType> can. They are purely to group a
number of items of data that are always used together. For this reason they are
not the first choice of constructs for building reusable maintainable schemas,
but they can have their uses.
<xs:group name="CustomerDataGroup">
<xs:sequence>
<xs:element name="Forename" type="xs:string" />
<xs:element name="Surname" type="xs:string" />
<xs:element name="Dob" type="xs:date" />
</xs:sequence>
</xs:group>
<xs:attributeGroup name="DobPropertiesGroup">
<xs:attribute name="Day" type="xs:string" />
<xs:attribute name="Month" type="xs:string" />
<xs:attribute name="Year" type="xs:integer" />
</xs:attributeGroup>
These
groups then can be referenced in the definition of complex types, as shown
below.
<xs:complexType
name="Customer">
<xs:sequence>
<xs:group
ref="CustomerDataGroup"/>
<xs:element name="..." type="..."/>
</xs:sequence>
<xs:attributeGroup
ref="DobPropertiesGroup"/>
</xs:complexType>
XML Data Mapping
XML data mapping has to do with generating XML from application data and creating application data from XML. Application data covers the common data types developers work with every day: Boolean/logical values, numbers, strings, date-time values, arrays, associative arrays (dictionaries, maps, hash tables), database record sets and complex object types. The process of converting application data to XML is called serialization. The XML is a serialized representation of the application data. The process of generating application data from XML is called deserialization.
XML data mapping has to do with generating XML from application data and creating application data from XML. Application data covers the common data types developers work with every day: Boolean/logical values, numbers, strings, date-time values, arrays, associative arrays (dictionaries, maps, hash tables), database record sets and complex object types. The process of converting application data to XML is called serialization. The XML is a serialized representation of the application data. The process of generating application data from XML is called deserialization.
The traditional approach for generating XML
from application data has been to sit down and custom-code how data values
become elements, attributes and element content. The traditional approach of
working with XML to produce application data has been to parse it using a
simple API for XML (SAX) or Document Object Model (DOM) parser. Data structures
are built from the SAX events or the DOM tree using custom code. There are,
however, better ways to map data to and from XML using technologies
specifically built for serializing and deserializing data.
Schema Translation
Schema translation refers to the conversion of XML documents from one format to another. It is also known as XML integration/conversion. Schema translation is very important in the context of B2B because the world of business is highly heterogeneous.
Schema translation refers to the conversion of XML documents from one format to another. It is also known as XML integration/conversion. Schema translation is very important in the context of B2B because the world of business is highly heterogeneous.
XML
Generator
The XML Generator is a powerful tool for automatically generating
XML instance documents which conform to any XML Schema data model. The
generation of the sample XML files can be customized according to your
preferences, and always results in valid, well-formed XML.
A Mapper is a
component which helps in mapping the data from one XML structure to another XML
structure. This is generally used after the raw unstructured data has been
brought into a staging XML structure. Once, the data is in XML, if we need to
refine the structure more, we can use a Mapper.
You
must use the source and target properties to identify the root elements of the
XML documents. For example, if
The
document element of the source is Persons, and the document element of the
output is Summary Data, set the Source and target as follows:

The Mapper has Four
Anchors/Mapping Components as Mentioned Below:
1. Alternative Mapping
2. Embedded mapping
3. Group Mapping
4. Repeating Group
Mapping
Alternative Mapping
This Mapper anchor lets you define a set of alternative, nested mapper
anchors. You can define a criterion for the alternative that the Mapper should
accept. Only the accepted alternative affects the mapper output. The other mapper
anchors, whether failed or successful, have no effect on the mapper output.
Example
The input XML may contain a product element or a service element,
but not both. You wish to process whichever element is in the input.
Define an alternative mappings mapper anchor, and set its selector
property to script order. Within the alternative mappings, nest two map
actions. Configure one of them to process the product element and the other to
process service.
The Selector is the criterion
for deciding which alternative to go for, the options are:
Script Order>>Data
transformation tests the nested mapper anchors in the sequence that they are
defined in the IntelliScript. It accepts the first one that succeeds. If all
the nested Mapper anchors fail, the alternative mapping component fails.
Name Switch>>.
Data transformation searches for the nested mapper anchor whose name property
is specified in a data holder. It ignores the other nested mapper anchors. If
the named mapper anchor fails, the alternative mapping component fails.
Group Mapping
The
Group Mapping mapper anchor binds its nested mapper anchors and actions
together. You can set properties of the group mapping that affect the members
of the group.
Basic Properties:
1.
Source/Target
These
properties are useful in situations where the mapper anchor must select
specific occurrences of data holders.
2.
Absent
If
selected, the group mapping succeeds only if one of its nested, non-optional
mapper anchors or actions fails. You can use this feature to test for the
absence of nested mapper anchors.
Repeating Group Mapping
This
mapper anchor processes a repetitive structure in the input or output. A
repeating group mapping is useful if the XML input and/or output contains a
multiple-occurrence data holder. It iterates over occurrences of the data
holders.
Within
the repeating group mapping, nest the mapper anchors and actions that process
each occurrence of the data holder.
Basic Properties:
1. Count:
The
number of iterations to run. Enter a number, or click the browse button and
select a data holder that contains the number. If blank, the iterations
continue until the input is exhausted.
2. Current Iteration:
A
data holder, where the repeating group mapping should output the number of the
current iteration.
3. Source/Target:
These
properties are useful in situations where the mapper anchor must select
specific occurrences of data holders.
4. On_Iteration_Fail:
If
iteration fails, writes an entry in the user log or triggers a notification.
Uses the on fail property to write an entry if the entire repeating group
mapping fails. Use on_iteration_fail to write an entry if a single iteration fails.
5. On_Fail:
If
the component fails, writes an entry in the user log or triggers a
notification.
Example:

The
repeating group mapping iterates over the person elements of the input. It uses
map actions to write the data to the name and ID elements of the output.
Locator:
This
component is used in the source and target properties to identify a data
holder. You can use it to identify either a single-occurrence or
multiple-occurrence data holder. In the latter case, each iteration of the
component that uses the locator processes the next occurrence of the data
holder.
Important Properties:
Data _Holder
The
data holder that the component identifies.
Locator by key:
This
component is used in the source and target properties to identify an occurrence
of a multiple-occurrence data holder. Before you use this component, you must
define a key at the global level of the IntelliScript. The key specifies the
data holders that uniquely identify the occurrence.
In
the locator by key configuration, you must specify:
>>The
key that you wish to use.
>>
The values of the key fields. You can specify the values either statically, by
typing a value, or dynamically, by selecting a data holder that contains the
value. In case of conflicts, a nested locator by key overrides a parent
locator.
Important Properties:
Key:
From
a Schema view, select the XPath predicate representation of the key.
For
example, if you have defined Hobbies/Hobby/@name as a key, then you can select
Hobbies/Hobby [@name=$1].
Params:
Under
this property, specify the values of the parameters in the xpath predicate.
($1, $2, and so forth). Type each value, or click the browse button and select
a data holder that contains the value.
Example:

Locator
component is used to identify an
occurrence of child. Each iteration processes the next occurrence of child,
sequentially. Locator by key component is used to identify an occurrence of
parent.
KEY & Example:
A
key defines attributes or elements that serve as a unique identifier of their
parent element. You can define a key only at the global level of the
IntelliScript. This allows you to reference the key anywhere in the project. The name of a key is case-sensitive.

The
key is the name attribute, which uniquely identifies each Hobby.
This mapper anchor activates a secondary mapper, which stores its
output in the same output document. A mapper can use an embedded mapper
component to call itself recursively, until all levels of nesting are
exhausted.
Example
The XML input is a family tree. The input contains Person
elements, which are recursively nested as shown:
<Person> <! -- Parent -->
...
<Person> <! -- Child -->
...
<Person> <! -- Grandchild -->
</Person>
</Person>
</Person>
In the recursive example described above, person should be
connected to Person/Person. This instructs the secondary instance of the mapper
to process a nested level of the input. It is sufficient to connect just the
parent element (Person), and not the nested elements (Person/*s/Name,
Person/*s/Birth Date, etc.), provided that the two Person elements have the
same data type.
Basic Properties
Property
|
Description
|
mapper
|
The name of the
secondary mapper.
|
schema connections
|
·
Connects the data
holders that are referenced in the secondary mapper to the data holders that
are referenced in the main mapper.
·
The property contains
a list of connect subcomponents that define the correspondence.
·
If all the data
holders in the main and secondary mappers are identical, you can omit this property.
·
If there are any
differences between the data holders, you must connect the data holders
explicitly, even the ones that are identical.
|
.
Advanced Properties
Property
|
Description
|
name
|
A
name that you assign to the component. Data transformation includes the name
in the event log. This can help you find an event that was caused by the
particular component.
|
remark
|
A
comment describing the component.
|
disabled
|
If
selected, data transformation ignores the component. This is useful for
testing and debugging, or for making minor modifications in a project without
deleting the existing components.
|
optional
|
By
default, if a component fails, its parent component fails. If you select the
optional property, the parent component does not fail.
|
on fail
|
If
the component fails, writes an entry in the user log or triggers a
notification.
|
To
create the mapper, you will have to start with a blank project.
1.
On the Data Transformation Studio menu, click File > New > Project.

2.
Under the Data Transformation node, select a Blank Project.
3.
In the wizard, name the project Tutorial_6, and click Finish.


4.
In the Data Transformation Explorer, expand the project. Notice that it
contains a default TGP script file, but it contains no schemas or other
components.

5.
In the Data Transformation Explorer, right-click the XSD node and click Add
File. Add the schemas Input.xsd and Output.xsd. Both schemas are necessary
because you must define the structure of both the input and output XML.

6.
The Schema view displays the elements that are defined in both schemas. You
will work with the Persons branch of the tree for the input, and with the
SummaryData branch for the output.

7.
We recommend that you copy the test documents, Input.xml and
ExpectedOutput.xml, into the project folder. By default, the folder is in the
below link but we can change it to a different location as well.
My Documents\Informatica\DataTransformation\9.0\workspace\Tutorial_6
Creating A Mapper:
•
Add XSD
input and output schemas to the project.


•
At the
top level of the IntelliScript, add a mapper component
•
Assign
the source and target properties of the mapper to the input and output elements
of the mapper, respectively.
•
Just
add locator for the source as well as target from the XSD

•
Edit
the other properties of the mapper as required
•
Within
the mapper, nest a sequence of map actions, mapper anchors, and any other
required components.
•
Test
the mapper and modify the IntelliScript if required.
Configuring The Mapper:
1. Open the script file in an IntelliScript editor.
2. Define a global component called mapper1, and give it a type of
Mapper.

3. Display the advanced properties of the mapper, and assign the
example_source property to the file input.xml.

4. Assign the source and target properties of the mapper as
illustrated. The properties define the schema branches where the mapper will
retrieve its input and store its output.

5. Under the contains
line of the mapper, insert a RepeatingGroupMapping component. You
will use this mapper anchor to iterate over the repetitive XML structures in
the input and output.

6. Within
the RepeatingGroupMapping, insert two map actions. The purpose of these actions
is to copy the data from the input to the output. Configure the source property
of each map action to retrieve a data holder from the input. Configure the
target property to write the corresponding data holder in the output. As you do
this, the studio color codes the corresponding elements and attributes of the
example source. By examining the color coding, you can confirm that you have
defined the source of each action correctly.

7. Set the mapper as the startup component and run it. Check the events
view for errors.
8. Compare the results file, which is called output.xml, with ExpectedOutput.xml.
If you have configured the mapper correctly, the files should be identical,
except perhaps for the
<?xml?> processing instruction, which depends on options in the
project properties.

Identifying errors and creating an error handling strategy is very
important. Each B2B Data Exchange transformation uses the following ports in
error handling:
·
DXErrorCode. When a transformation fails, the transformation sets the
DXErrorCode to a value greater than zero.
·
DXErrorMessage. When a transformation fails, the transformation stores an error
message in the DXErrorMessage port to describe the failure.
When a transformation generates an error, the transformation
performs the following tasks:
·
The transformation writes
the error to the Power Center session log. The error log includes the exception
class, description, cause, and stack trace. The logging level is based on the
power center configuration. Up to 1K of the document associated with the error
will be included in the log.
·
If the option to set the
event status to error when a transformation fails is set to true, the transformation
sets the status of the event to error.
DX_Throw_Error
This is a B2B Data Exchange Transformation that handles errors in
the workflow. It generates an error when the transformation fails.
It performs the following tasks:
·
Sets the status of the
associated event to ERROR.
·
Creates the error message
from value of the DXDescription port.
·
Attaches the error message
to the associated event.
·
Logs the error in the
session log.
Input Ports
The DX_Throw_Error transformation has the following input ports:
Port
|
Type
|
Description
|
DXDescription
|
string
|
Description of the error. This is the error message added to the
session log.
This is also used as the description for the log document
attached to the event.
|
DXMessageType
|
string
|
Type of the error event.
Optional.
Alphanumeric value to associate with the event.
Any value is valid.
|
DXMIMEType
|
string
|
MIME type of the document to attach to the event
|
Input/Output Ports
The DX_Throw_Error transformation has the following input/output
ports:
Port
|
Type
|
Description
|
DXEventId
|
string
|
ID of the event associated with the error.
|
DXData
|
string
binary
text
|
Log document to attach to the event. This port can contain the
data of the document or a file path to the document. If the value of the
parameter is null,the transformation creates an empty document and adds the
document to the event.
To attach a document with text data, set the datatype of the
port to string or text. To attach a document with binary data, change the
datatype of the port to binary.
|
DXDataByReference
|
string
|
Indicates whether the DXData port contains the document data or
a document reference. If the value is true, the DXData port contains a
document reference. If the value is null or the value is false, the DXData
port contains the document data.
|
DXErrorMessage
|
string
|
Error message generated by the transformation.
|
DXErrorCode
|
string
|
Error code generated by the transformation.If the transformation
fails, the value of the DXErrorCode port is greater than zero.
|
Data Exchange
Properties
You can configure the following Data Exchange properties in the
DX_Throw_Error transformation:
Property
|
Description
|
Error log document description
|
Description for the error log document that this transformation
attaches to the event.
|
Message type
|
Alphanumeric value to associate with the event. Any value is
valid
|
Generate an error in case a
failure occurs in this
transformation
|
Indicates whether to set the status of the event to ERROR when
the transformation generates
|
Running the Mapper
To run a mapper in Data Transformation Studio:
1. Set the mapper as the startup component.
2. Click Run > Run.
3. In the I/O Ports table, double-click the input row and select
the input XML file.
Alternatively, you can set the example_source property of the
mapper. This lets you test a mapper repeatedly on the same input, without
needing to browse to the file each time.
4. When the execution is complete, Data Transformation Studio
displays the Events view. Examine the events for any failures or warnings.
5. View the mapping results by opening the output.xml file,
located in the Results folder of the project.
Project Files
The main file types used in Data Transformation projects are as
follows.
Main project file: A *.cmw file that contains the main project
information. The name of this file is the name of the project.
Example source documents: Examples of the input documents that you
want a transformation to process.
Script files: One or more *.tgp files that define transformations.
XML schemas: One or more *.xsd files that define XML structures
used in transformations.
Result files: Files created
as a result of the running of the project, such as output and log files.
Filenames
To ensure cross-platform compatibility, the names of project
directories and files must contain only English letters (A-Z, a-z), numerals
(0-9), spaces, and the following symbols:
# % & + , - . = @ [ ] _ { }
Displaying A File
Under each project, the Data Transformation Explorer displays
categories for the above file types, such as:
Additional
Examples
Scripts
XSD
Results
To display any of the files, double-click its name in the Data
Transformation Explorer.
Importing An Existing Project
You can import a project that exists outside the current Eclipse
workspace.
The import procedure copies the essential project files, such as
the CMW, TGP, XSD, and example-source files. It may fail to copy additional
files, such as test documents, which you have stored in the project folder. Copy
such files manually.
There is no link between the original project and the imported
copy. If you edit the imported copy, the original project is unchanged.
To import a project:
1. On the menu, click File > Import. This displays an import
wizard.
2. Select the option to import an Existing Data Transformation
Project into Workspace, and click Next.
3. Browse to the *.cmw file of the existing project.
4. Enter a name for the imported project. The default is the name
of the existing project.
5. Click Finish.
6. If the existing project was created in a previous Data
Transformation version, you are prompted to upgrade the imported copy. Click
OK.
The system copies the project into the Data Transformation
workspace folder. When the import is complete, the imported copy is displayed
in the Data Transformation Explorer.
Viewing The Results
To view the output of a project, double-click the appropriate file
in the Results category of the Data Transformation Explorer view. For example,
the output of a parser is an XML file. To view the output, find the XML file in
the Results category, and double-click on its name.

Each Data Exchange transformation uses the following ports in
error handling:
DXErrorCode. When a transformation fails, the transformation sets the
DXErrorCode to a value greater than zero.
DXErrorMessage. When a transformation fails, the transformation stores an error
message in the DXErrorMessage port to describe the failure.
When a transformation generates an error, the transformation
performs the following tasks:
The transformation writes the error to the power center session
log. The error log includes the exception class, description, cause, and stack
trace. The logging level is based on the power center configuration. Up to 1K
of the document associated with the error will be included in the log.
If the option to set the event status to error when a
transformation fails is set to true, the transformation sets the status of the
event to error.
Viewing the Event Log
When you run a transformation in the Data Transformation Studio
environment, the Events view displays the events that occur during the
execution. Examine the events for failure or warning messages. By default, the studio
event log is the file results/events.cme in the project folder.
Event-Log Properties
In the project properties, you can configure the events that Data
Transformation writes to the log
Event Display
Preferences
You can customize the event display by using the Window >
Preferences option. On the Data Transformation page of the preferences, you can
configure:
The types of events that the Studio displays, such as
notifications, warnings, or failures.
Whether the failure events propagate (“bubble up”) in the events
tree, propagation lets you find the failure events more easily because they are
labeled at the top levels of the tree.
The preferences are independent of the event-log properties. The
properties control the events that the system stores in the log. The
preferences control how the stored events are displayed.
To configure event preferences:
1. Click Window> Preferences.
2. Select the Data Transformation Events category.
3. Under the Filters heading, choose the events you want Data
Transformation to display
The choices are:
Notifications.
Warnings.
Failures.
4. Optionally, click propagate all events alternatively, select
individual events to be propagated and click propagate selected events. You can
select multiple events by pressing control and clicking on each relevant row.
Event Display without Propagation:

Event Display with Propagation:

Understanding the
Event Log
The event log displays the detailed events that occurred during
the execution. For example, it displays an event for each anchor that a parser
found. To display the events at a particular stage, select the stage in the
left pane of the Events view.
The
events are labeled with status icons, which have the following meanings:
Status Icon Meaning
Description
Information A normal operation performed by Data
Transformation.
Warning A
warning about a possible error.
For example, Data Transformation generates a warning event if an
operation overwrites the existing content of a data holder. The execution
continues. Failure A component failed. For example, an anchor fails if Data
Transformation cannot find it in the source document. The execution continues.
Optional Failure An optional component, configured with the optional property,
failed.
Following are the different types statuses and corresponding
icons:

You can deploy a project as a Data Transformation service on the
development computer where Data Transformation Studio is installed. This allows
you to develop, test, and run applications that activate the service.
To do this, you must have write privileges for the Data
Transformation repository and for the log folder.
To deploy a service:
1. In Data Transformation Studio, open and select the project.
2. Click Project > Deploy.
3. In the Deploy Service window, set the following options:
Option Description
Service Name The name of the service. By default, this is the
project name.To ensure cross-platform compatibility, the name must contain only
English letters (A-Z, a-z), numerals (0-9), spaces, and the following
symbols: % & + - = @ _ { }
Data Transformation creates a folder having the service name, in
the repository location.Label A version identifier. The default value is a time
stamp indicating when the service was deployed.
Startup Component The runnable component that the service should
start.
Author The person who developed the project.Description A
description of the service.
4. Click the Deploy button.
The Studio displays a message that the service was successfully
deployed. The service appears in the Repository view.
Redeploying a Project
:
Data Transformation Studio cannot open a deployed project that is
located in the repository. If you need to edit the transformation, work on the
original project and redeploy it. Redeploying overwrites the complete service
folder, including any output files or other files that you have stored in it.
Running a Service
After you deploy a service, you are ready to run it in Data
Transformation Engine. You can do this in several ways:
By using the Data Transformation Engine command-line interface.
Deploying a Service to a Production Server 315
By programming an application that uses the Data Transformation
API to submit source documents to the Engine. The API is available in several
programming languages.
By using the Unstructured Data Transformation in Informatica
PowerCenter.
By using integration agents that run Data Transformation services
within third-party systems.
Serialization
is the opposite of parsing. A parser converts a source document from any format
to an XML file. A serializer converts an XML file to an output document in any
format. The output of a serializer can be a text document, an HTML document, or
even another XML.It is easier to define a serializer
than a parser because the input is a fully structured, unambiguous XML document. The serializer is very simple. It contains only
four serialization anchors, which are the opposite of the anchors used in
parsing.
The
serializer has some interesting points
·
The serializer is recursive.
That is, it calls itself repetitively to serialize the nested sections of an
XML document.
·
The output of the serializer
is a worksheet that you can open in Excel. We can define the serializer by
editing the IntelliScript.
·
It is also possible to generate a serializer
automatically by inverting the operation of a parser
Creating the Project
To create the project:
1. On the Data Transformation Studio menu, click File > New
> Project.
2. Under the Data Transformation node, select a Serializer
Project.
3. On the following wizard pages, specify the following options:
·
Name the project Tutorial_5.
·
Name the serializer
FamilyTreeSerializer.
·
Name the script file
Serializer_Script.
4. When we reach the Schema page, browse to the schema FamilyTree.xsd,
which is in the tutorials\Exercises\Files_For_Tutorial_5 folder.
5. When we finish the wizard, the Data Transformation Explorer
displays the new project. Double click the Serializer_Script.tgp file to edit
it.
6. Unlike a parser, a serializer does not have an example source
file. Optionally, we can hide the empty example pane of the IntelliScript
editor. To do this, click IntelliScript > IntelliScript or click the toolbar
button that is labeled Show IntelliScript Pane Only. To restore the example
pane, click IntelliScript > both on click the button that is labeled Show
Both IntelliScript and Example Panes.

7. To design the serializer, we will use the FamilyTree.xml input
document, whose content is presented above.
Configuring the
Serializer
To configure the serializer:
1. Display the advanced properties of the serializer, and set
output_file_extension =.csv, with a leading period.
When you run the serializer in the Studio, this causes the output
file to have the name output.csv. By default,
*.csv files open in Microsoft Excel.

2. Under the contains line of the serializer, insert a Content
Serializer serialization anchor and configure its properties as follows:
Data holder = /Person/*s/Name
closing_str = ","
This means that the serialization anchor writes the content of the
/Person/*s/Name data holder to the output file. It appends the closing string
"," (a comma).

3. Define a second Content Serializer as
illustrated:

This Content Serializer writes the /Person/*s/Age data holder to
the output. It appends a carriage return (ASCII code 013) and a linefeed (ASCII
010) to the output.
To type the ASCII codes:
·
Select the closing_str
property and press Enter.
·
On the keyboard, press
Ctrl+a. This displays a small dot in the text box.
·
Type 013.
·
Press Ctrl+a again.
·
Type 010.
·
Press Enter to complete the property
assignment.
4.
Run the serializer. To do this:
·
Set the serializer as the startup
component.
·
On the menu, click Run > Run.
·
In the I/O Ports table, edit the first
row and open the test input file, FamilyTree.xml.
·
Click the Run button.
·
When the serializer has completed,
examine the Events view for errors.
·
In the Data Transformation Explorer
view, under Results , double-click output.csv to view the output.
Assuming
that Excel is installed on the computer, Data Transformation displays an Excel
window, like this:

If
Excel is not installed on the computer, you can view the output file in
Notepad. Alternatively, we can copy the file to another computer where Excel is
installed, and open it there.
Calling the Serializer
Recursively
To call the serializer recursively:
1. Insert a Repeating Group Serializer.

2.
Within the Repeating Group Serializer, nest an Embedded Serializer. The purpose
of this serialization anchor is to call a secondary serializer.
3.
Assign the properties of the Embedded Serializer as illustrated.

The
properties have the following meanings:
·
The assignment serializer =
FamilyTreeSerializer means that the secondary serializer is the same as the
main serializer.
·
The schema connections property means that the
secondary serializer should process /Person/*s/Children/*s/Person as though it
were a top-level /Person element. This is what lets the serializer move down
through the generations of the family tree.
·
The optional property means that the secondary
serializer does not cause the main serializer to fail when it runs out of data.
4.
Run the serializer again. The result should be:

Creating a Serializer
Serializer
can be created by any of the following methods:
·
By inverting the configuration of an
existing parser.
·
By using the New Serializer wizard.
·
By editing the IntelliScript and
inserting a Serializer component.
1.
Creating a Serializer by Inverting a Parser
To create a serializer automatically from a parser:
1. In Data
Transformation Studio, open an existing parser in an IntelliScript editor.
2. Right-click the
parser and click Create Serializer.
3. Test the serializer

2.
Creating a Serializer by Using the New Serializer Wizard
New Serializer wizard is used to create a serializer.
Creating a Serializer
Project
To create a new project that contains a serializer:
1. Click File > New
> Project.
2. Under the Data
Transformation category, select a Serializer Project and click next.
3. Follow the wizard prompts
to enter the serializer options.
When you finish, the Data
Transformation Explorer view displays the new project containing the serializer.
The Component view displays the serializer.
Creating a Serializer
in an Existing Project
To create a new serializer in an existing project:
1. Click File > New
> Serializer.
2. Follow the wizard
prompts to enter the serializer options.
When you finish, the Data Transformation Explorer view displays a
new TGP script file defining the serializer. The Component view displays the
serializer.
The following table describes the wizard options.
Option
|
Description
|
Serializer
name
|
A
name for the serializer.
|
Script
name
|
A
name for a TGP script file where the wizard stores the serializer definition.
|
Schema
file path
|
The
name of a schema defining the XML syntax of the serializer input.
|
Serializer
Configuration
To complete the serializer configuration:
1. Display the serializer in an IntelliScript editor.
2. Under the contains line, add a sequence of serialization
anchors and actions.
3. Run and test the serializer, and modify the IntelliScript as
required.
3.
Creating a Serializer by Editing the IntelliScript
To create a serializer in the IntelliScript:
1. At the top level of the IntelliScript, select the three dots
(...) symbol. Press Enter and type a name for the serializer.
2. To the right of the name, press Enter. Select a Serializer
component from the list.
3. Expand the tree under the Serializer component. Assign its
properties as required.
4. Add a schema defining the XML syntax of the serializer input.
5. Under the contains line, add a sequence of nested serialization
anchors and actions.
6. Run and test the serializer and modify the IntelliScript as
required.
Running a Serializer
To run a serializer in Data Transformation Studio:
1. Set the serializer as the startup component.
2. Click Run > Run.
3. In the I/O Ports table, double-click the input row and select
the input XML file.
4. When the execution is complete, Data Transformation Studio displays
the Events view. Examine the events for any failures or warnings.
5. To view the serialization results, open the output file,
located in the Results folder of the project.
Serialization Anchors
The main components that you can use in a serializer are called
serialization anchors. These are analogous to the anchors that are used in a
parser, except that they work in the opposite direction. Anchors read data from
locations in the source document and write the data to XML. Serialization
anchors read XML data and write the data to locations in the output document. A
serialization anchor is not an anchor, despite their similar names. Anchors
cannot be used in a serializer, and you
cannot use serialization anchors in a parser.
The most important serialization anchors are Content Serializer
and String Serializer:
·
A Content Serializer writes
the content of a specified data holder to the output document. It is the
inverse of a Content anchor, which reads content from a source document.
·
A String Serializer writes a
predefined string to the output. It is the inverse of a Marker anchor, which
finds a predefined string in a source document.
Standard Serializer
Properties
Name:
A
name that you assign to the component. Data Transformation includes the name in
the event log. This will help to find an event that was caused by the
particular component.
Remark:
A
comment describing the component.
Disabled:
If
selected, Data Transformation ignores the component. This is useful for testing
and debugging, or for making minor modifications in a project without deleting
the existing components.
Optional:
By
default, if a component fails, its parent component fails. If we select the
optional property, the parent component does not fail.
On fail:
If
the component fails, writes an entry in the user log or triggers a
notification.
Notifications:
A
list of Notification Handler components that the component runs on
notifications triggered by nested components.
Serialization Anchor
Component Reference
1.Alternative Serializers
This serialization anchor helps us to define a set of alternative,
nested serialization anchors. We can define a criterion for the alternative
that the serializer should accept. Only the accepted alternative affects the
serializer output. The other serialization anchors, whether failed or
successful, have no effect on the serializer output.
Example
The input XML might contain a Product element or a Service
element, but not both. We want to serialize whichever element is in the input.
Define an Alternative Serializers serialization anchor, and set
its selector property to Script Order. Within the Alternative
Serializers, nest two Content Serializer serialization anchors. Configure one
of them to process the Product element and the other to process Service.
Properties
1.
Selector: The criterion for deciding which alternative to accept. The options
are:
·
Script Order. Data Transformation tests
the nested serialization anchors in the sequence that they are defined in the
IntelliScript. It accepts the first one that succeeds. If all the nested
serialization anchors fail, the Alternative Serializers component fails.
·
Name Switch. Data Transformation
searches for the nested serialization anchor whose name property is specified
in a data holder. It ignores the other nested serialization anchors. If the
named serialization anchor fails, the Alternative Serializers component fails.
2. Content Serializer
This serialization anchor writes the serialized data to the output
document.
Properties
1.
Opening str: A string that the anchor should write before the data holder.
2.
Closing str: A string that the anchor should write after the data holder.
3.
Data holder: The data holder containing
the data.
3. Delimited Sections
Serializer
This serialization anchor processes sections of data. Between each
section of the output, the Delimited Sections Serializer writes a separator
string.
Within the Delimited Sections Serializer, nest other serialization
anchors. Each nested serialization anchor is responsible for outputting a
single section.
Example
The XML input contains an employee resume. We wish to write the
data to an output text document in the following format:
----------------------------
Jane Palmer
Employee ID 123456
----------------------------
Professional Experience
...
----------------------------
Education
...
Define a Delimited Sections Serializer with the line of hyphens as
its separator. Because we want a line of hyphens before each section, set
separator_position = before.
Within the Delimited Sections Serializer, nest three Group
Serializer components. The first Group Serializer writes the Jane Palmer
section, the second writes the Professional Experience section, and so forth.
Optional Sections
In the above example, suppose that the second section,
Professional Experience, is missing from some input XML documents. You
nonetheless want to write its separator to the output, like this:
----------------------------
Jane Palmer
Employee ID 123456
----------------------------
----------------------------
Education
...
To
support this situation, configure the Delimited Sections Serializer in the
following way:
·
In the second Group Serializer, select
the optional property. This means that if the Group Serializer fails, it should
not cause the Delimited Sections Serializer to fail.
·
In the Delimited Sections Serializer,
set using_placeholders = always. This means to write the separator of an
optional section, even if the section itself is missing.
Alternatively,
suppose that if the Professional Experience section is missing, we do not want
to write its separator:
----------------------------
Jane
Palmer
Employee
ID 123456
----------------------------
Education
...
In
this case, configure the Delimited Sections Serializer as follows:
·
In the second Group Serializer, select
the optional property.
·
In the Delimited Sections Serializer,
set using_placeholders = never. This means not to write the separator of a
missing section.
Properties
1.
Separator: The separator string.
2.
Separator position: Position of
the separator relative to the sections. The options are before, after, between,
and around.
3.
Using placeholders: This property specifies whether the Delimited Sections
Serializer should write the separator of an optional section that is missing
from the XML input. The options are always, never, and when necessary.
4. Embedded
Serializer
This serialization anchor activates a secondary Serializer, which
writes its output in the same output document. A Serializer
can use an Embedded Serializer component to call itself recursively, until all
levels of nesting are exhausted.
Example
The XML input is a family tree. The input contains Person
elements, which are recursively nested as shown:
<Person> <! -- Parent -->
...
<Person> <! -- Child -->
...
<Person> <! -- Grandchild -->
...
</Person>
</Person>
</Person>
In
the recursive example described above, Person should be connected to Person/
Person.
This instructs the secondary instance of the serializer to process a nested
level of the input. It is sufficient to connect just the parent element
(Person), and not the nested elements (Person/*s/Name, Person/*s/Birth Date,
etc.), provided that the two Person elements have the same data type.
Properties
1.
Serializer: The name of the secondary serializer. The serializer must be
defined at the global level of the IntelliScript.
2.
Schema connections: Connects the data holders that are referenced in the
secondary serializer to the data holders that are referenced in the main
serializer.
If all the data holders in the main and
secondary serializers are identical, we can
omit this property. If there are any differences between the data
holders, we must connect the data holders explicitly, even the ones that are
identical.
5. Group Serializer
The Group Serializer serialization anchor binds its nested
serialization anchors together .We can set properties of the Group Serializer
that affect the members of the group.
Properties
1.
Source target: These properties are useful in situations where the serialization
anchor must select specific occurrences of data holders.
6. Repeating Group Serializer
This serialization anchor writes a repetitive structure to the
output document.
A Repeating Group Serializer is useful if the XML data contains a
multiple-occurrence data holder. It iterates over the occurrences of the data
holder and outputs the data
Within the Repeating Group Serializer, nest serialization anchors
that process and output each occurrence of the data holder. Optionally, we can
define a separator that the Repeating Group Serializer writes to the output
between the iterations.
Example
The XML input contains the following structure:
<Persons>
<Person>
<Name>John</Name>
<Age>35</Age>
</Person>
<Person>
<Name>Larissa</Name>
<Age>42</Age>
</Person>
...
</Persons>
A
Repeating Group Serializer, using a newline character as a separator, can
output this data to:
John 35
Larissa 42
We
can iterate over several multiple-occurrence data holders in parallel. For example,
you can iterate over a list of men and a list of women, and output a list of
married couples. To do this, insert a Content Serializer within the repeating
group for each data holder.
Properties
1.
Separator: A serialization anchor, typically a String Serializer that outputs
the separator.
2.
Separator position: Position of the separator relative to the iterations. The
options are before, after, between, and around.
7. String Serializer
This serialization anchor writes a predefined string to the output
document.
Properties
1.Str:
The string to write.
Using the Serializer
in informatica powercenter:
Similar
to a mpper, serialzer is also called as a service using unstructured data
transformation. The call of service can be static or dynamic, this property can
be set in UDT.
How to get spaces ,If a segment is missing in the XML.
ReplyDeleteEX:
xx
1234
This segment is situational so it is present for 1 record and other records might not have it.so is it possible to look for the Xpath in the serializer?if the Xpath not found assign spaces?.or is there any other work around?
This comment has been removed by the author.
Delete