Sunday, 10 November 2013

B2B DATA TRANSFORMATION



Overview of B2B Data Transformation

·         B2B Data Transformation enables you to transform data efficiently from any format to any other format.
·         Data Transformation can process fully structured, semi-structured, or unstructured data.
·         The software can be configured to work with text, binary data, messaging formats, HTML pages, PDF documents, word-processor documents, etc.
·         Data Transformation parser can be configured to transform the data to any standard or custom XML vocabulary.
·         In the reverse direction, a Data Transformation serializer can be configured to transform the XML data to any other format.
·         Data Transformation mapper can be configured to perform XML to XML transformations.

How B2B Data Transformation Works
The Data Transformation system has two main components:

Component
Description

Data Transformation Studio
The design and configuration environment of Data Transformation.

Data Transformation Engine
The transformation engine.



Data Transformation Studio
·         The Studio is a visual editor environment where you can design and configure transformations such as parsers, serializers, and mappers.
·         Studio can be used to configure Data Transformation to process data of a particular type.

Data Transformation Engine
·         Data Transformation Engine is an efficient transformation processor.
·         It has no user interface. It works entirely in the background, executing the transformations that have been previously defined in Studio.
·          To move a transformation from the Studio to the Engine, we must deploy the transformation as a DataTransformation service.

Default Installation Folder
By default, Data Transformation is installed in the following location:
C:\Program Files\Informatica\DataTransformation
The setup prompts you to change the location if desired.

Tutorials and Workspace Folders
By default, the locationof tutorials  is:
C:\Program Files\Informatica\DataTransformation\tutorials and that of workspace is:
My Documents\Informatica\DataTransformation\9.0\workspace



Transformation Architecture
·         Transformation Components
·         Data Holders
·         Documents
·         Data Transformation Services

Transformation Components
Top Level Components

Component
Description
Parser
A component that converts source documents in any format to XML.
Serializer
A component that converts XML documents to output documents in any format.
Mapper
A component that converts XML documents to a different XML structure or schema.
Transformer
A component that modifies data. The input and output can be in any format.
Streamer
A component that splits large inputs into segments that are processed separately by the other components.




Nested Components

Component
Description
Formats
Define the overall format of documents, such as the delimiters, that Data Transformation should use to interpret the documents.
Document processors
Operate on a document as a whole, performing preliminary or final conversions.
Anchors
Define the data in a source document that a parser should process and extract. The anchors specify how a parser should search for the data and where it should store the data that it finds.
Serialization anchors
Define how a serializer should write XML data to an output document.
Mapper anchors
Define how a mapper should write XML data to another XML structure or schema. The anchors specify where to find the data in the source XML and where to write the data in the output XML.
Transformers
In addition to their use as top-level components, you can nest transformers within a parser or a serializer.




                      XSD AND XML
Those who deal with data transfer or document exchange within or across organizations with heterogeneous platforms will certainly accept and appreciate the need and power of XML.
  • What is XSD Schema?
  • What are the advantages of XSD Schema?
  • What is important in XSD Schema?
What Is a Schema?
A schema is a "Structure", and the actual document or data that is represented through the schema is called "Document Instance". Those who are familiar with relational databases can map a schema to a Table Structure and a Document Instance to a record in a Table. And those who are familiar with object-oriented technology can map a schema to a Class Definition and map a Document Instance to an Object Instance.
A structure of an XML document can be defined as follows:
  • Document Type Definition (DTDs)
  • XML Schema Definition (XSD)
  • XML Data Reduced (XDR) -proprietary to Microsoft Technology
What Is XSD?
XSD provides the syntax and defines a way in which elements and attributes can be represented in a XML document. It also advocates that the given XML document should be of a specific format and specific data type.


Advantages of XSD
So what is the benefit of this XSD Schema?
  • XSD Schema is an XML document so there is no real need to learn any new syntax, unlike DTDs.
  • XSD Schema supports Inheritance, where one schema can inherit from another schema. This is a great feature because it provides the opportunity for re-usability.
  • XSD schema provides the ability to define own data type from the existing data type.
  • XSD schema provides the ability to specify data types for both elements and attributes.
Overview
First, look at what an XML schema is. A schema formally describes what a given XML document contains, in the same way a database schema describes the data that can be contained in a database (table structure, data types). An XML schema describes the coarse shape of the XML document, what fields an element can contain, which sub elements it can contain, and so forth. It also can describe the values that can be placed into any element or attribute.
Elements
Elements are the main building block of any XML document; they contain the data and determine the structure of the document. An element can be defined within an XML Schema (XSD) as follows:

<xs:element name="x" type="y"/>

An element definition within the XSD must have a name property; this is the name that will appear in the XML document. The type property provides the description of what can be contained within the element when it appears in the XML document. There are a number of predefined types, such as xs:string, xs:integer, xs:boolean or xs:date . You also can create a user-defined type by using the <xs:simple type> and <xs:complexType> tags, but more on these later.
If you have set the type property for an element in the XSD, the corresponding value in the XML document must be in the correct format for its given type. (Failure to do this will cause a validation error.)
Examples of simple elements and their XML are below:
<xs:element name = “customer_dob”
type =”xs:date”/>
<xs:element name = “customer_address”
type =”xs:string”/>

The value the element takes in the XML document can further be affected by using the fixed and default properties.
Default means that, if no value is specified in the XML document, the application reading the document (typically an XML parser or XML Data binding Library) should use the default specified in the XSD.
Fixed means the value in the XML document can only have the value specified in the XSD.
For this reason, it does not make sense to use both default and fixed in the same element definition. (In fact, it's illegal to do so.)
<xs:element name="Customer_name" type="xs:string" default="unknown"/>
<xs:element name="Customer_location" type="xs:string" fixed=" Bangalore"/>

Cardinality

Specifying how many times an element can appear is referred to as cardinality, and is specified by using the minOccurs and maxOccurs attributes. In this way, an element can be mandatory, optional, or appear many times. MinOccurs can be assigned any non-negative integer value (for example: 0, 1, 2, 3... and so forth), and maxOccurs can be assigned any non-negative integer value or the string constant "unbounded", meaning no maximum.
The default values for minOccurs and maxOccurs is 1. So, if both the minOccurs and maxOccurs attributes are absent, the element must appear once and once only.
<xs:element name="Customer_order"
            type="xs:integer"
            minOccurs ="0"
                   maxOccurs="unbounded"/>

Compositors

There are three types of compositors <xs:sequence>, <xs:choice>, and <xs:all>. These compositors allow you to determine how the child elements within them appear within the XML document.

Compositor
Description
Sequence
The child elements in the XML document MUST appear in the order they are declared in the XSD schema.
Choice
Only one of the child elements described in the XSD schema can appear in the XML document.
All
The child elements described in the XSD schema can appear in the XML document in any order.

 Notes
The <xs:sequence> and <xs:choice> compositors can be nested inside other compositors, and be given their own minOccurs and maxOccurs properties. This allows for quite complex combinations to be formed.
 Example:
<xs:element name="Customer">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="Dob" type="xs:date" />
         <xs:element name="Address">
            <xs:complexType>
               <xs:sequence>
                  <xs:element name="Line1" type="xs:string" />
                  <xs:element name="Line2" type="xs:string" />
               </xs:sequence>
            </xs:complexType>
         </xs:element>
      </xs:sequence>
   </xs:complexType>
</xs:element>
 
<xs:element name="Supplier">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="Phone" type="xs:integer" />
         <xs:element name="Address">
            <xs:complexType>
               <xs:sequence>
                  <xs:element name="Line1" type="xs:string" />
                  <xs:element name="Line2" type="xs:string" />
               </xs:sequence>
            </xs:complexType>
         </xs:element>
      </xs:sequence>
   </xs:complexType>
</xs:element>
The above code will appear as the diagram shown below:

 Re-Use
It would make much more sense to have one definition of "Address" that could be used by both customer and supplier. You can do this by defining a complexType independently of an element:

<xs:complexType name="AddressType">
   <xs:sequence>
      <xs:element name="Line1" type="xs:string"/>
      <xs:element name="Line2" type="xs:string"/>
   </xs:sequence>
</xs:complexType>
http://www.codeguru.com/dbfiles/get_image.php?id=13529&lbl=IMG8_PNG&ds=20070524
Thus this becomes as:
<xs:element name="Customer">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="Dob"     type="xs:date"/>
         <xs:element name="Address" type="AddressType"/>
      </xs:sequence>
   </xs:complexType>
</xs:element> 
<xs:element name="supplier">
   <xs:complexType>
      <xs:sequence> 
         <xs:element name="address" type="AddressType"/>
         <xs:element name="phone"   type="xs:integer"/>
      </xs:sequence>
   </xs:complexType>
</xs:element>
http://www.codeguru.com/dbfiles/get_image.php?id=13529&lbl=IMG9_PNG&ds=20070524
The advantage should be obvious. Instead of having to define Address twice (once for Customer and once for Supplier), you have a single definition. This makes maintenance simpler ie if you decide to add "Line3" or "Postcode" elements to your address; you only have to add them in one place.

Attributes

 

An attribute provides extra information within an element. Attributes are defined within an XSD as follows, having name and type properties.
<xs:attribute name="x" type="y"/>
An Attribute can appear 0 or 1 times within a given element in the XML document. Attributes are either optional or mandatory (by default, they are optional). The " use" property in the XSD definition specifies whether the attribute is optional or mandatory.
So, the following are equivalent:
<xs:attribute name="ID" type="xs:string"/>
<xs:attribute name="ID" type="xs:string" use="optional"/>


Graphically:
 http://www.codeguru.com/dbfiles/get_image.php?id=13529&lbl=IMG4_PNG&ds=20070524
To specify that an attribute must be present, use = "required"
Some of the problems with using attributes are:
·         Attributes cannot contain multiple values (child elements can).
·         Attributes are not easily expandable (to incorporate future changes to the schema).
·         Attributes cannot describe structures (child elements can).
Namespaces:
Namespaces are a mechanism for breaking up your schemas. Until now, you have assumed that you only have a single schema file containing all your element definitions, but the XSD standard allows you to structure your XSD schemas by breaking them into multiple files. These child schemas can then be included into a parent schema.
Breaking schemas into multiple files can have several advantages. You can create re-usable definitions that can used across several projects. They make definitions easier to read and version as they break down the schema into smaller units that are simpler to manage.

 

 

Element and Attribute Groups

Elements and Attributes can be grouped together using <xs:group> and <xs:attributeGroup>. These groups can then be referred to elsewhere within the schema. Groups must have a unique name and be defined as children of the <xs:schema> element. When a group is referred to, it is as if its contents have been copied into the location it is referenced from.
Note: <xs:group> and <xs:attributeGroup> cannot be extended or restricted in the way <xs:complexType> or <xs:simpleType> can. They are purely to group a number of items of data that are always used together. For this reason they are not the first choice of constructs for building reusable maintainable schemas, but they can have their uses.

<xs:group name="CustomerDataGroup">
   <xs:sequence>
      <xs:element name="Forename" type="xs:string" />
      <xs:element name="Surname"  type="xs:string" />
      <xs:element name="Dob"      type="xs:date" />
   </xs:sequence>
</xs:group>
 
<xs:attributeGroup name="DobPropertiesGroup">
   <xs:attribute name="Day"   type="xs:string" />
   <xs:attribute name="Month" type="xs:string" />
   <xs:attribute name="Year"  type="xs:integer" />
</xs:attributeGroup>
These groups then can be referenced in the definition of complex types, as shown below.
<xs:complexType name="Customer">
   <xs:sequence>
      <xs:group ref="CustomerDataGroup"/>
      <xs:element name="..." type="..."/>
   </xs:sequence>
   <xs:attributeGroup ref="DobPropertiesGroup"/>
</xs:complexType>

XML Data Mapping
XML data mapping has to do with generating XML from application data and creating application data from XML. Application data covers the common data types developers work with every day: Boolean/logical values, numbers, strings, date-time values, arrays, associative arrays (dictionaries, maps, hash tables), database record sets and complex object types. The process of converting application data to XML is called serialization. The XML is a serialized representation of the application data. The process of generating application data from XML is called deserialization.
The traditional approach for generating XML from application data has been to sit down and custom-code how data values become elements, attributes and element content. The traditional approach of working with XML to produce application data has been to parse it using a simple API for XML (SAX) or Document Object Model (DOM) parser. Data structures are built from the SAX events or the DOM tree using custom code. There are, however, better ways to map data to and from XML using technologies specifically built for serializing and deserializing data.

Schema Translation
Schema translation refers to the conversion of XML documents from one format to another. It is also known as XML integration/conversion. Schema translation is very important in the context of B2B because the world of business is highly heterogeneous.

XML Generator

The XML Generator is a powerful tool for automatically generating XML instance documents which conform to any XML Schema data model. The generation of the sample XML files can be customized according to your preferences, and always results in valid, well-formed XML.




                                        MAPPER
A Mapper is a component which helps in mapping the data from one XML structure to another XML structure. This is generally used after the raw unstructured data has been brought into a staging XML structure. Once, the data is in XML, if we need to refine the structure more, we can use a Mapper.
You must use the source and target properties to identify the root elements of the XML documents. For example, if
The document element of the source is Persons, and the document element of the output is Summary Data, set the Source and target as follows:


The Mapper has Four Anchors/Mapping Components as Mentioned Below:
1. Alternative Mapping
2. Embedded mapping
3. Group Mapping
4. Repeating Group Mapping

Alternative Mapping
This Mapper anchor lets you define a set of alternative, nested mapper anchors. You can define a criterion for the alternative that the Mapper should accept. Only the accepted alternative affects the mapper output. The other mapper anchors, whether failed or successful, have no effect on the mapper output.

Example
The input XML may contain a product element or a service element, but not both. You wish to process whichever element is in the input.
Define an alternative mappings mapper anchor, and set its selector property to script order. Within the alternative mappings, nest two map actions. Configure one of them to process the product element and the other to process service.

The Selector is the criterion for deciding which alternative to go for, the options are:

Script Order>>Data transformation tests the nested mapper anchors in the sequence that they are defined in the IntelliScript. It accepts the first one that succeeds. If all the nested Mapper anchors fail, the alternative mapping component fails.

Name Switch>>. Data transformation searches for the nested mapper anchor whose name property is specified in a data holder. It ignores the other nested mapper anchors. If the named mapper anchor fails, the alternative mapping component fails.

Group Mapping
The Group Mapping mapper anchor binds its nested mapper anchors and actions together. You can set properties of the group mapping that affect the members of the group.




Basic Properties:

1.     Source/Target
These properties are useful in situations where the mapper anchor must select specific occurrences of data holders.

2.     Absent
If selected, the group mapping succeeds only if one of its nested, non-optional mapper anchors or actions fails. You can use this feature to test for the absence of nested mapper anchors.

Repeating Group Mapping
This mapper anchor processes a repetitive structure in the input or output. A repeating group mapping is useful if the XML input and/or output contains a multiple-occurrence data holder. It iterates over occurrences of the data holders.                                             
Within the repeating group mapping, nest the mapper anchors and actions that process each occurrence of the data holder.

Basic Properties:

1. Count:
The number of iterations to run. Enter a number, or click the browse button and select a data holder that contains the number. If blank, the iterations continue until the input is exhausted.

2. Current Iteration:
A data holder, where the repeating group mapping should output the number of the current iteration.



3. Source/Target:
These properties are useful in situations where the mapper anchor must select specific occurrences of data holders.

4. On_Iteration_Fail:
If iteration fails, writes an entry in the user log or triggers a notification. Uses the on fail property to write an entry if the entire repeating group mapping fails. Use on_iteration_fail to write an entry if a single iteration fails.

5. On_Fail:
If the component fails, writes an entry in the user log or triggers a notification.

Example:


The repeating group mapping iterates over the person elements of the input. It uses map actions to write the data to the name and ID elements of the output.

            LOCATOR & LOCATOR BY KEY
Locator:
This component is used in the source and target properties to identify a data holder. You can use it to identify either a single-occurrence or multiple-occurrence data holder. In the latter case, each iteration of the component that uses the locator processes the next occurrence of the data holder.
Important Properties:
Data _Holder
The data holder that the component identifies.

Locator by key:
This component is used in the source and target properties to identify an occurrence of a multiple-occurrence data holder. Before you use this component, you must define a key at the global level of the IntelliScript. The key specifies the data holders that uniquely identify the occurrence.
In the locator by key configuration, you must specify:
>>The key that you wish to use.
>> The values of the key fields. You can specify the values either statically, by typing a value, or dynamically, by selecting a data holder that contains the value. In case of conflicts, a nested locator by key overrides a parent locator.



Important Properties:
Key:
From a Schema view, select the XPath predicate representation of the key.
For example, if you have defined Hobbies/Hobby/@name as a key, then you can select Hobbies/Hobby [@name=$1].
Params:
Under this property, specify the values of the parameters in the xpath predicate. ($1, $2, and so forth). Type each value, or click the browse button and select a data holder that contains the value.
Example:

Locator component is used  to identify an occurrence of child. Each iteration processes the next occurrence of child, sequentially. Locator by key component is used to identify an occurrence of parent.

KEY & Example:
A key defines attributes or elements that serve as a unique identifier of their parent element. You can define a key only at the global level of the IntelliScript. This allows you to reference the key anywhere in the project. The name of a key is case-sensitive.


The key is the name attribute, which uniquely identifies each Hobby.

                               EMBEDDED MAPPER

This mapper anchor activates a secondary mapper, which stores its output in the same output document. A mapper can use an embedded mapper component to call itself recursively, until all levels of nesting are exhausted.

Example
The XML input is a family tree. The input contains Person elements, which are recursively nested as shown:
<Person> <! -- Parent -->
...
<Person> <! -- Child -->
...
<Person> <! -- Grandchild -->
</Person>
</Person>
</Person>

In the recursive example described above, person should be connected to Person/Person. This instructs the secondary instance of the mapper to process a nested level of the input. It is sufficient to connect just the parent element (Person), and not the nested elements (Person/*s/Name, Person/*s/Birth Date, etc.), provided that the two Person elements have the same data type.

Basic Properties
Property
Description
mapper
         The name of the secondary mapper.
schema connections
·         Connects the data holders that are referenced in the secondary mapper to the data holders that are referenced in the main mapper.
·         The property contains a list of connect subcomponents that define the correspondence.
·         If all the data holders in the main and secondary mappers are identical, you can omit this property.
·         If there are any differences between the data holders, you must connect the data holders explicitly, even the ones that are identical.


.
Advanced Properties

Property
Description
name
A name that you assign to the component. Data transformation includes the name in the event log. This can help you find an event that was caused by the particular component.

remark
A comment describing the component.

disabled
If selected, data transformation ignores the component. This is useful for testing and debugging, or for making minor modifications in a project without deleting the existing components.
optional
By default, if a component fails, its parent component fails. If you select the optional property, the parent component does not fail.
on fail

If the component fails, writes an entry in the user log or triggers a notification.



           CREATING A PROJECT
To create the mapper, you will have to start with a blank project.

1. On the Data Transformation Studio menu, click File > New > Project.


2. Under the Data Transformation node, select a Blank Project.

3. In the wizard, name the project Tutorial_6, and click Finish.

4. In the Data Transformation Explorer, expand the project. Notice that it contains a default TGP script file, but it contains no schemas or other components.

5. In the Data Transformation Explorer, right-click the XSD node and click Add File. Add the schemas Input.xsd and Output.xsd. Both schemas are necessary because you must define the structure of both the input and output XML.


6. The Schema view displays the elements that are defined in both schemas. You will work with the Persons branch of the tree for the input, and with the SummaryData branch for the output.

7. We recommend that you copy the test documents, Input.xml and ExpectedOutput.xml, into the project folder. By default, the folder is in the below link but we can change it to a different location as well.
My Documents\Informatica\DataTransformation\9.0\workspace\Tutorial_6

Creating A Mapper:
      Add XSD input and output schemas to the project.

      At the top level of the IntelliScript, add a mapper component
      Assign the source and target properties of the mapper to the input and output elements of the mapper, respectively.
      Just add locator for the source as well as target from the XSD

      Edit the other properties of the mapper as required
      Within the mapper, nest a sequence of map actions, mapper anchors, and any other required components.
      Test the mapper and modify the IntelliScript if required.

 


Configuring The Mapper:
1.      Open the script file in an IntelliScript editor.
2.      Define a global component called mapper1, and give it a type of Mapper.
3.      Display the advanced properties of the mapper, and assign the example_source property to the file input.xml.
4.      Assign the source and target properties of the mapper as illustrated. The properties define the schema branches where the mapper will retrieve its input and store its output.

5.      Under  the contains line of the  mapper, insert a RepeatingGroupMapping component. You will use this mapper anchor to iterate over the repetitive XML structures in the input and output.


6.      Within the RepeatingGroupMapping, insert two map actions. The purpose of these actions is to copy the data from the input to the output. Configure the source property of each map action to retrieve a data holder from the input. Configure the target property to write the corresponding data holder in the output. As you do this, the studio color codes the corresponding elements and attributes of the example source. By examining the color coding, you can confirm that you have defined the source of each action correctly.




7.      Set the mapper as the startup component and run it. Check the events view for errors.

8.      Compare the results file, which is called output.xml, with ExpectedOutput.xml. If you have configured the mapper correctly, the files should be identical, except perhaps for the  <?xml?>  processing  instruction, which depends on options in the project properties.

                     



             ERROR HANDLING

Identifying errors and creating an error handling strategy is very important. Each B2B Data Exchange transformation uses the following ports in error handling:

·         DXErrorCode. When a transformation fails, the transformation sets the DXErrorCode to a value greater than zero.
·         DXErrorMessage. When a transformation fails, the transformation stores an error message in the DXErrorMessage port to describe the failure.

When a transformation generates an error, the transformation performs the following tasks:
·         The transformation writes the error to the Power Center session log. The error log includes the exception class, description, cause, and stack trace. The logging level is based on the power center configuration. Up to 1K of the document associated with the error will be included in the log.
·         If the option to set the event status to error when a transformation fails is set to true, the transformation sets the status of the event to error.

DX_Throw_Error

This is a B2B Data Exchange Transformation that handles errors in the workflow. It generates an error when the transformation fails.
It performs the following tasks:
·         Sets the status of the associated event to ERROR.
·         Creates the error message from value of the DXDescription port.
·         Attaches the error message to the associated event.
·         Logs the error in the session log.

Input Ports
The DX_Throw_Error transformation has the following input ports:

Port
Type
Description
DXDescription

string
Description of the error. This is the error message added to the session log.
This is also used as the description for the log document attached to the event.

DXMessageType


string
Type of the error event.
Optional.
Alphanumeric value to associate with the event.
Any value is valid.

DXMIMEType

string
MIME type of the document to attach to the event


Input/Output Ports

The DX_Throw_Error transformation has the following input/output ports:

Port
Type
Description
DXEventId
string
ID of the event associated with the error.

DXData
string
binary
text
Log document to attach to the event. This port can contain the data of the document or a file path to the document. If the value of the parameter is null,the transformation creates an empty document and adds the document to the event.
To attach a document with text data, set the datatype of the port to string or text. To attach a document with binary data, change the datatype of the port to binary.

DXDataByReference

string
Indicates whether the DXData port contains the document data or a document reference. If the value is true, the DXData port contains a document reference. If the value is null or the value is false, the DXData port contains the document data.
DXErrorMessage

string
Error message generated by the transformation.

DXErrorCode

string
Error code generated by the transformation.If the transformation fails, the value of the DXErrorCode port is greater than zero.


Data Exchange Properties
You can configure the following Data Exchange properties in the DX_Throw_Error transformation:

Property
Description
Error log document description
Description for the error log document that this transformation attaches to the event.
Message type

Alphanumeric value to associate with the event. Any value is valid
Generate an error in case a
failure occurs in this
transformation

Indicates whether to set the status of the event to ERROR when the transformation generates







                      




Running the Mapper
To run a mapper in Data Transformation Studio:
1. Set the mapper as the startup component.
2. Click Run > Run.
3. In the I/O Ports table, double-click the input row and select the input XML file.
Alternatively, you can set the example_source property of the mapper. This lets you test a mapper repeatedly on the same input, without needing to browse to the file each time.
4. When the execution is complete, Data Transformation Studio displays the Events view. Examine the events for any failures or warnings.
5. View the mapping results by opening the output.xml file, located in the Results folder of the project.

Project Files
The main file types used in Data Transformation projects are as follows.

Main project file: A *.cmw file that contains the main project information. The name of this file is the name of the project.
Example source documents: Examples of the input documents that you want a transformation to process.
Script files: One or more *.tgp files that define transformations.
XML schemas: One or more *.xsd files that define XML structures used in transformations.
Result files:  Files created as a result of the running of the project, such as output and log files.




Filenames
To ensure cross-platform compatibility, the names of project directories and files must contain only English letters (A-Z, a-z), numerals (0-9), spaces, and the following symbols:
# % & + , - . = @ [ ] _ { }

Displaying A File
Under each project, the Data Transformation Explorer displays categories for the above file types, such as:
Additional
Examples
Scripts
XSD
Results
To display any of the files, double-click its name in the Data Transformation Explorer.

Importing An Existing Project
You can import a project that exists outside the current Eclipse workspace.
The import procedure copies the essential project files, such as the CMW, TGP, XSD, and example-source files. It may fail to copy additional files, such as test documents, which you have stored in the project folder. Copy such files manually.
There is no link between the original project and the imported copy. If you edit the imported copy, the original project is unchanged.

To import a project:
1. On the menu, click File > Import. This displays an import wizard.
2. Select the option to import an Existing Data Transformation Project into Workspace, and click Next.
3. Browse to the *.cmw file of the existing project.

4. Enter a name for the imported project. The default is the name of the existing project.
5. Click Finish.
6. If the existing project was created in a previous Data Transformation version, you are prompted to upgrade the imported copy. Click OK.
The system copies the project into the Data Transformation workspace folder. When the import is complete, the imported copy is displayed in the Data Transformation Explorer.

Viewing The Results
To view the output of a project, double-click the appropriate file in the Results category of the Data Transformation Explorer view. For example, the output of a parser is an XML file. To view the output, find the XML file in the Results category, and double-click on its name.




                        



              




Each Data Exchange transformation uses the following ports in error handling:
DXErrorCode. When a transformation fails, the transformation sets the DXErrorCode to a value greater than zero.
DXErrorMessage. When a transformation fails, the transformation stores an error message in the DXErrorMessage port to describe the failure.

When a transformation generates an error, the transformation performs the following tasks:
The transformation writes the error to the power center session log. The error log includes the exception class, description, cause, and stack trace. The logging level is based on the power center configuration. Up to 1K of the document associated with the error will be included in the log.
If the option to set the event status to error when a transformation fails is set to true, the transformation sets the status of the event to error.

Viewing the Event Log
When you run a transformation in the Data Transformation Studio environment, the Events view displays the events that occur during the execution. Examine the events for failure or warning messages. By default, the studio event log is the file results/events.cme in the project folder.

Event-Log Properties
In the project properties, you can configure the events that Data Transformation writes to the log

Event Display Preferences
You can customize the event display by using the Window > Preferences option. On the Data Transformation page of the preferences, you can configure:
The types of events that the Studio displays, such as notifications, warnings, or failures.

Whether the failure events propagate (“bubble up”) in the events tree, propagation lets you find the failure events more easily because they are labeled at the top levels of the tree.

The preferences are independent of the event-log properties. The properties control the events that the system stores in the log. The preferences control how the stored events are displayed.

To configure event preferences:
1. Click Window> Preferences.
2. Select the Data Transformation Events category.
3. Under the Filters heading, choose the events you want Data Transformation to display
    The choices are:
     Notifications.
    Warnings.
    Failures.
4. Optionally, click propagate all events alternatively, select individual events to be propagated and click propagate selected events. You can select multiple events by pressing control and clicking on each relevant row.

Event Display without Propagation:






Event Display with Propagation:



Understanding the Event Log
The event log displays the detailed events that occurred during the execution. For example, it displays an event for each anchor that a parser found. To display the events at a particular stage, select the stage in the left pane of the Events view.

The events are labeled with status icons, which have the following meanings:
Status Icon                Meaning Description
Information                A normal operation performed by Data Transformation.
Warning                     A warning about a possible error.

For example, Data Transformation generates a warning event if an operation overwrites the existing content of a data holder. The execution continues. Failure A component failed. For example, an anchor fails if Data Transformation cannot find it in the source document. The execution continues. Optional Failure An optional component, configured with the optional property, failed.

Following are the different types statuses and corresponding icons:







You can deploy a project as a Data Transformation service on the development computer where Data Transformation Studio is installed. This allows you to develop, test, and run applications that activate the service.

To do this, you must have write privileges for the Data Transformation repository and for the log folder.

 To deploy a service:

1. In Data Transformation Studio, open and select the project.
2. Click Project > Deploy.
3. In the Deploy Service window, set the following options:

Option Description

Service Name The name of the service. By default, this is the project name.To ensure cross-platform compatibility, the name must contain only English letters (A-Z, a-z), numerals (0-9), spaces, and the following symbols:      % & + - = @ _ { }
Data Transformation creates a folder having the service name, in the repository location.Label A version identifier. The default value is a time stamp indicating when the service was deployed.

Startup Component The runnable component that the service should start.
Author The person who developed the project.Description A description of the service.
4. Click the Deploy button.
The Studio displays a message that the service was successfully deployed. The service appears in the Repository view.

Redeploying a Project  :

Data Transformation Studio cannot open a deployed project that is located in the repository. If you need to edit the transformation, work on the original project and redeploy it. Redeploying overwrites the complete service folder, including any output files or other files that you have stored in it.

Running a Service

After you deploy a service, you are ready to run it in Data Transformation Engine. You can do this in several ways:
By using the Data Transformation Engine command-line interface.
Deploying a Service to a Production Server 315
By programming an application that uses the Data Transformation API to submit source documents to the Engine. The API is available in several programming languages.
By using the Unstructured Data Transformation in Informatica PowerCenter.
By using integration agents that run Data Transformation services within third-party systems.
          

         



Serialization is the opposite of parsing. A parser converts a source document from any format to an XML file. A serializer converts an XML file to an output document in any format. The output of a serializer can be a text document, an HTML document, or even another XML.It is easier to define a serializer than a parser because the input is a fully structured, unambiguous XML document. The serializer is very simple. It contains only four serialization anchors, which are the opposite of the anchors used in parsing.

The serializer has some interesting points
·         The serializer is recursive. That is, it calls itself repetitively to serialize the nested sections of an XML document.
·         The output of the serializer is a worksheet that you can open in Excel. We can define the serializer by editing the IntelliScript.
·          It is also possible to generate a serializer automatically by inverting the operation of a parser

Creating the Project
To create the project:
1. On the Data Transformation Studio menu, click File > New > Project.
2. Under the Data Transformation node, select a Serializer Project.
3. On the following wizard pages, specify the following options:
·         Name the project Tutorial_5.
·         Name the serializer FamilyTreeSerializer.
·         Name the script file Serializer_Script.
4. When we reach the Schema page, browse to the schema FamilyTree.xsd, which is in the tutorials\Exercises\Files_For_Tutorial_5 folder.
5. When we finish the wizard, the Data Transformation Explorer displays the new project. Double click the Serializer_Script.tgp file to edit it.
6. Unlike a parser, a serializer does not have an example source file. Optionally, we can hide the empty example pane of the IntelliScript editor. To do this, click IntelliScript > IntelliScript or click the toolbar button that is labeled Show IntelliScript Pane Only. To restore the example pane, click IntelliScript > both on click the button that is labeled Show Both IntelliScript and Example Panes.



7. To design the serializer, we will use the FamilyTree.xml input document, whose content is presented above.

Configuring the Serializer
To configure the serializer:
1. Display the advanced properties of the serializer, and set output_file_extension =.csv, with a leading period.
When you run the serializer in the Studio, this causes the output file to have the name output.csv. By default,
*.csv files open in Microsoft Excel.
2. Under the contains line of the serializer, insert a Content Serializer serialization anchor and configure its properties as follows:
Data holder = /Person/*s/Name
closing_str = ","
This means that the serialization anchor writes the content of the /Person/*s/Name data holder to the output file. It appends the closing string "," (a comma).

3. Define a second Content Serializer as illustrated:
This Content Serializer writes the /Person/*s/Age data holder to the output. It appends a carriage return (ASCII code 013) and a linefeed (ASCII 010) to the output.
To type the ASCII codes:
·         Select the closing_str property and press Enter.
·         On the keyboard, press Ctrl+a. This displays a small dot in the text box.
·         Type 013.
·          Press Ctrl+a again.
·         Type 010.
·         Press Enter to complete the property assignment.
4. Run the serializer. To do this:
·         Set the serializer as the startup component.
·         On the menu, click Run > Run.
·         In the I/O Ports table, edit the first row and open the test input file, FamilyTree.xml.
·         Click the Run button.
·         When the serializer has completed, examine the Events view for errors.
·         In the Data Transformation Explorer view, under Results , double-click output.csv to view the output.

Assuming that Excel is installed on the computer, Data Transformation displays an Excel window, like this:


If Excel is not installed on the computer, you can view the output file in Notepad. Alternatively, we can copy the file to another computer where Excel is installed, and open it there.


Calling the Serializer Recursively
To call the serializer recursively:
1. Insert a Repeating Group Serializer.
2. Within the Repeating Group Serializer, nest an Embedded Serializer. The purpose of this serialization anchor is to call a secondary serializer.
3. Assign the properties of the Embedded Serializer as illustrated.
The properties have the following meanings:
·         The assignment serializer = FamilyTreeSerializer means that the secondary serializer is the same as the main serializer.
·          The schema connections property means that the secondary serializer should process /Person/*s/Children/*s/Person as though it were a top-level /Person element. This is what lets the serializer move down through the generations of the family tree.
·         The optional property means that the secondary serializer does not cause the main serializer to fail when it runs out of data.
4. Run the serializer again. The result should be:

Creating a Serializer
Serializer can be created by any of the following methods:
·         By inverting the configuration of an existing parser.
·         By using the New Serializer wizard.
·         By editing the IntelliScript and inserting a Serializer component.

1.    Creating a Serializer by Inverting a Parser
To create a serializer automatically from a parser:
      1. In Data Transformation Studio, open an existing parser in an IntelliScript editor.
      2. Right-click the parser and click Create Serializer.
      3. Test the serializer




2.    Creating a Serializer by Using the New Serializer Wizard
New Serializer wizard is used to create a serializer.
Creating a Serializer Project
To create a new project that contains a serializer:
      1. Click File > New > Project.
      2. Under the Data Transformation category, select a Serializer Project and click    next.
      3. Follow the wizard prompts to enter the serializer options.
 When you finish, the Data Transformation Explorer view displays the new project   containing the serializer.
The Component view displays the serializer.

Creating a Serializer in an Existing Project
To create a new serializer in an existing project:
      1. Click File > New > Serializer.
      2. Follow the wizard prompts to enter the serializer options.
When you finish, the Data Transformation Explorer view displays a new TGP script file defining the serializer. The Component view displays the serializer.

The following table describes the wizard options.

Option
Description
Serializer name
A name for the serializer.
Script name
A name for a TGP script file where the wizard stores the serializer definition.
Schema file path
The name of a schema defining the XML syntax of the serializer input.



Serializer Configuration
To complete the serializer configuration:
1. Display the serializer in an IntelliScript editor.
2. Under the contains line, add a sequence of serialization anchors and actions.
3. Run and test the serializer, and modify the IntelliScript as required.

3.    Creating a Serializer by Editing the IntelliScript
To create a serializer in the IntelliScript:
1. At the top level of the IntelliScript, select the three dots (...) symbol. Press Enter and type a name for the serializer.
2. To the right of the name, press Enter. Select a Serializer component from the list.
3. Expand the tree under the Serializer component. Assign its properties as required.
4. Add a schema defining the XML syntax of the serializer input.
5. Under the contains line, add a sequence of nested serialization anchors and actions.
6. Run and test the serializer and modify the IntelliScript as required.

Running a Serializer
To run a serializer in Data Transformation Studio:
1. Set the serializer as the startup component.
2. Click Run > Run.
3. In the I/O Ports table, double-click the input row and select the input XML file.
4. When the execution is complete, Data Transformation Studio displays the Events view. Examine the events for any failures or warnings.
5. To view the serialization results, open the output file, located in the Results folder of the project.



Serialization Anchors
The main components that you can use in a serializer are called serialization anchors. These are analogous to the anchors that are used in a parser, except that they work in the opposite direction. Anchors read data from locations in the source document and write the data to XML. Serialization anchors read XML data and write the data to locations in the output document. A serialization anchor is not an anchor, despite their similar names. Anchors cannot be used  in a serializer, and you cannot use serialization anchors in a parser.
The most important serialization anchors are Content Serializer and String Serializer:
·         A Content Serializer writes the content of a specified data holder to the output document. It is the inverse of a Content anchor, which reads content from a source document.
·         A String Serializer writes a predefined string to the output. It is the inverse of a Marker anchor, which finds a predefined string in a source document.

Standard Serializer Properties

Name:
A name that you assign to the component. Data Transformation includes the name in the event log. This will help to find an event that was caused by the particular component.

Remark:
A comment describing the component.

Disabled:
If selected, Data Transformation ignores the component. This is useful for testing and debugging, or for making minor modifications in a project without deleting the existing components.

Optional:
By default, if a component fails, its parent component fails. If we select the optional property, the parent component does not fail.

On fail:
If the component fails, writes an entry in the user log or triggers a notification.

Notifications:
A list of Notification Handler components that the component runs on notifications triggered by nested components.

Serialization Anchor Component Reference
1.Alternative Serializers
This serialization anchor helps us to define a set of alternative, nested serialization anchors. We can define a criterion for the alternative that the serializer should accept. Only the accepted alternative affects the serializer output. The other serialization anchors, whether failed or successful, have no effect on the serializer output.

Example
The input XML might contain a Product element or a Service element, but not both. We want to serialize whichever element is in the input.

Define an Alternative Serializers serialization anchor, and set its selector property to Script Order. Within the Alternative Serializers, nest two Content Serializer serialization anchors. Configure one of them to process the Product element and the other to process Service.

Properties
1. Selector: The criterion for deciding which alternative to accept. The options are:
·         Script Order. Data Transformation tests the nested serialization anchors in the sequence that they are defined in the IntelliScript. It accepts the first one that succeeds. If all the nested serialization anchors fail, the Alternative Serializers component fails.
·         Name Switch. Data Transformation searches for the nested serialization anchor whose name property is specified in a data holder. It ignores the other nested serialization anchors. If the named serialization anchor fails, the Alternative Serializers component fails.

2. Content Serializer
This serialization anchor writes the serialized data to the output document.

Properties
1. Opening str: A string that the anchor should write before the data holder.
2. Closing str: A string that the anchor should write after the data holder.
3. Data holder:  The data holder containing the data.

3. Delimited Sections Serializer
This serialization anchor processes sections of data. Between each section of the output, the Delimited Sections Serializer writes a separator string.
Within the Delimited Sections Serializer, nest other serialization anchors. Each nested serialization anchor is responsible for outputting a single section.

Example
The XML input contains an employee resume. We wish to write the data to an output text document in the following format:


----------------------------
Jane Palmer
Employee ID 123456
----------------------------
Professional Experience
...
----------------------------
Education
...
Define a Delimited Sections Serializer with the line of hyphens as its separator. Because we want a line of hyphens before each section, set separator_position = before.
Within the Delimited Sections Serializer, nest three Group Serializer components. The first Group Serializer writes the Jane Palmer section, the second writes the Professional Experience section, and so forth.

Optional Sections
In the above example, suppose that the second section, Professional Experience, is missing from some input XML documents. You nonetheless want to write its separator to the output, like this:
----------------------------
Jane Palmer
Employee ID 123456
----------------------------
----------------------------
Education
...
To support this situation, configure the Delimited Sections Serializer in the following way:
·         In the second Group Serializer, select the optional property. This means that if the Group Serializer fails, it should not cause the Delimited Sections Serializer to fail.
·         In the Delimited Sections Serializer, set using_placeholders = always. This means to write the separator of an optional section, even if the section itself is missing.


Alternatively, suppose that if the Professional Experience section is missing, we do not want to write its separator:
----------------------------
Jane Palmer
Employee ID 123456
----------------------------
Education
...

In this case, configure the Delimited Sections Serializer as follows:
·         In the second Group Serializer, select the optional property.
·         In the Delimited Sections Serializer, set using_placeholders = never. This means not to write the separator of a missing section.

Properties
1. Separator: The separator string.
2. Separator position:  Position of the separator relative to the sections. The options are before, after, between, and around.
3. Using placeholders: This property specifies whether the Delimited Sections Serializer should write the separator of an optional section that is missing from the XML input. The options are always, never, and when necessary.

4. Embedded Serializer
This serialization anchor activates a secondary Serializer, which writes its output in the same output document. A Serializer can use an Embedded Serializer component to call itself recursively, until all levels of nesting are exhausted.

Example
The XML input is a family tree. The input contains Person elements, which are recursively nested as shown:
<Person> <! -- Parent -->
...
<Person> <! -- Child -->
...
<Person> <! -- Grandchild -->
...
</Person>
</Person>
</Person>

In the recursive example described above, Person should be connected to Person/
Person. This instructs the secondary instance of the serializer to process a nested level of the input. It is sufficient to connect just the parent element (Person), and not the nested elements (Person/*s/Name, Person/*s/Birth Date, etc.), provided that the two Person elements have the same data type.

Properties
1. Serializer: The name of the secondary serializer. The serializer must be defined at the global level of the IntelliScript.
2. Schema connections: Connects the data holders that are referenced in the secondary serializer to the data holders that are referenced in the main serializer.
 If all the data holders in the main and secondary serializers are identical, we can   omit this property. If there are any differences between the data holders, we must connect the data holders explicitly, even the ones that are identical.

5. Group Serializer
The Group Serializer serialization anchor binds its nested serialization anchors together .We can set properties of the Group Serializer that affect the members of the group.

Properties
1. Source target: These properties are useful in situations where the serialization anchor must select specific occurrences of data holders.

6. Repeating Group Serializer
This serialization anchor writes a repetitive structure to the output document.
A Repeating Group Serializer is useful if the XML data contains a multiple-occurrence data holder. It iterates over the occurrences of the data holder and outputs the data
Within the Repeating Group Serializer, nest serialization anchors that process and output each occurrence of the data holder. Optionally, we can define a separator that the Repeating Group Serializer writes to the output between the iterations.

Example
The XML input contains the following structure:
<Persons>
<Person>
<Name>John</Name>
<Age>35</Age>
</Person>
<Person>
<Name>Larissa</Name>
<Age>42</Age>
</Person>
...
</Persons>


A Repeating Group Serializer, using a newline character as a separator, can output this data to:
                      John 35
                      Larissa 42
We can iterate over several multiple-occurrence data holders in parallel. For example, you can iterate over a list of men and a list of women, and output a list of married couples. To do this, insert a Content Serializer within the repeating group for each data holder.

Properties
1. Separator: A serialization anchor, typically a String Serializer that outputs the separator.
2. Separator position: Position of the separator relative to the iterations. The options are before, after, between, and around.

7. String Serializer
This serialization anchor writes a predefined string to the output document.

Properties
1.Str: The string to write.
Using the Serializer in informatica powercenter:
Similar to a mpper, serialzer is also called as a service using unstructured data transformation. The call of service can be static or dynamic, this property can be set in UDT.

2 comments:

  1. How to get spaces ,If a segment is missing in the XML.
    EX:

    xx
    1234

    This segment is situational so it is present for 1 record and other records might not have it.so is it possible to look for the Xpath in the serializer?if the Xpath not found assign spaces?.or is there any other work around?

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete