Data Transformation Manager Processing Threads in Informatica PowerCenter
Data Transformation Manager processing threads are the session-level threads used by Informatica PowerCenter to read source data, apply mapping logic, and write data to targets. When the PowerCenter Integration Service starts a session, the Data Transformation Manager (DTM) allocates process memory, divides it into buffers, and uses different thread types to run the session pipeline.
The DTM process is also called the pmdtm process. It is created for a session task and managed by the Integration Service. Inside the DTM, the master thread controls the session run, while mapping, reader, transformation, writer, and pre-session or post-session threads perform specific parts of the work.
How the DTM process uses memory buffers and session threads
Data Transformation Manager allocates process memory for the session and divides it into buffers. This memory is often called buffer memory. Reader threads place source data into buffers, transformation threads process the data in those buffers, and writer threads take processed data from buffers and load it into targets.
The exact number of DTM processing threads depends on the mapping design, session properties, source type, target type, and partitioning configuration. A simple non-partitioned session may use one reader thread, one transformation thread, and one writer thread. A partitioned pipeline can use multiple sets of these threads.
Different Data Transformation Manager processing threads
The main Data Transformation Manager processing threads created for a session are listed below. Each thread type has a clear responsibility in the session lifecycle.
| DTM thread type | Main responsibility | When it is important |
|---|---|---|
| Master thread | Controls DTM execution and creates other session threads. | Session initialization, thread coordination, session completion. |
| Mapping thread | Fetches session and mapping details, compiles the mapping, and performs cleanup. | Mapping validation, metadata loading, mapping compilation. |
| Pre-session thread | Runs configured pre-session operations. | Pre-session commands, stored procedures, SQL, file preparation. |
| Reader thread | Extracts source data and places it into buffers. | Source queries, source files, source connections, partitions. |
| Transformation thread | Processes data according to mapping transformation logic. | Expressions, lookups, joins, aggregations, sorting, caches. |
| Writer thread | Loads transformed data into targets. | Target inserts, updates, commits, target files, reject rows. |
| Post-session thread | Runs configured post-session operations. | Post-session SQL, stored procedures, shell commands, email flow. |
Master thread in the DTM process
The master thread is the main controlling thread inside the DTM process. It creates and manages the other threads required for the session. It coordinates the mapping thread, pre-session thread, reader threads, transformation threads, writer threads, and post-session thread.
The master thread does not perform all extraction, transformation, and loading work by itself. Instead, it controls the flow of execution and starts the appropriate worker threads based on the session and pipeline configuration.
Mapping thread for session metadata and mapping compilation
Mapping threads: The master thread creates one mapping thread for each session. The mapping thread fetches session and mapping information, compiles the mapping, and cleans up after session execution.
This thread is active during the preparation and cleanup phases of the session. If the mapping is invalid, metadata is missing, or the repository information cannot be read correctly, the session can fail before data processing starts.
Pre-session and post-session threads for session operations
Pre- and post-session threads: The master thread creates one pre-session and one post-session thread to perform pre-session and post-session operations.
A pre-session thread can run operations such as shell commands, SQL commands, and stored procedures before the mapping pipeline begins. A post-session thread can run cleanup commands, audit updates, post-session SQL, stored procedures, or notification-related tasks after the main data movement is complete.
Reader threads for relational and file sources
Reader threads: The master thread creates reader threads to extract source data. The number of reader threads depends on the partitioning information for each pipeline. The number of reader threads equals the number of partitions. Relational sources use relational reader threads, and file sources use file reader threads.
The PowerCenter Integration Service creates an SQL statement for each reader thread to extract data from a relational source. For file sources, the PowerCenter Integration Service can create multiple threads to read a single source.
Reader thread errors usually point to source-side problems. Examples include invalid SQL, unavailable source systems, incorrect file paths, missing file permissions, source code page issues, or source connection failures.
Transformation threads for mapping logic and cache usage
Transformation threads: The master thread creates one or more transformation threads for each partition. Transformation threads process data according to the transformation logic in the mapping.
The master thread creates transformation threads to transform data received in buffers by the reader thread, move the data from transformation to transformation, and create memory caches when necessary. The number of transformation threads depends on the partitioning information for each pipeline.
Transformation threads store transformed data in a buffer drawn from the memory pool for subsequent access by the writer thread. If the pipeline contains a Rank, Joiner, Aggregator, Sorter, or cached Lookup transformation, the transformation thread uses cache memory until it reaches the configured cache size limits. If the transformation thread requires more space, it pages to local cache files to hold additional data.
When the PowerCenter Integration Service runs in ASCII mode, the transformation threads pass character data in single bytes. When the PowerCenter Integration Service runs in Unicode mode, the transformation threads use double bytes to move character data.
Transformation thread failures are often caused by expression errors, lookup cache issues, join conditions, sorter cache limits, data conversion errors, or rejected rows caused by transformation logic.
Writer threads for target loading and commit handling
Writer threads: The master thread creates writer threads to load target data. The number of writer threads depends on the partitioning information for each pipeline. If the pipeline contains one partition, the master thread creates one writer thread.
If it contains multiple partitions, the master thread creates multiple writer threads. Each writer thread creates connections to the target databases to load data. If the target is a file, each writer thread creates a separate file. You can configure the session to merge these files.
If the target is relational, the writer thread takes data from buffers and commits it to session targets. When loading targets, the writer commits data based on the commit interval in the session properties. You can configure a session to commit data based on the number of source rows read, the number of rows written to the target, or the number of rows that pass through a transformation that generates transactions, such as a Transaction Control transformation.
Writer thread errors usually come from the target side. Common causes include primary key violations, database constraints, insufficient target permissions, target table locks, rejected rows, unavailable target connections, and incorrect commit or transaction settings.
Reader, transformation, and writer threads in a DTM pipeline
The core data movement in a DTM session happens through a pipeline. In a basic pipeline, the reader thread reads source rows, the transformation thread applies mapping logic, and the writer thread loads the resulting rows into the target.
- The reader thread extracts data from a source and stores source rows in DTM buffers.
- The transformation thread reads buffered data and applies the transformations defined in the mapping.
- The writer thread reads transformed data from buffers and writes it to the target.
This flow is simple to understand, but the actual number of threads can increase when partitioning is configured. Each partition can have its own reader, transformation, and writer processing path, depending on the mapping and session design.
How partitioning changes the number of DTM processing threads
Partitioning affects the number of DTM processing threads. When a pipeline has one partition, the DTM usually creates one reader thread, one transformation thread, and one writer thread for that pipeline. When the pipeline has multiple partitions, the DTM can create multiple reader, transformation, and writer threads.
For example, a session with three partitions may need three reader threads, three transformation thread paths, and three writer threads for the pipeline. The final number still depends on the source type, target type, transformations, and partition points in the mapping.
| Session pipeline setup | Typical DTM thread behavior |
|---|---|
| Single partition pipeline | One reader path, one transformation path, and one writer path for the pipeline. |
| Multiple partition pipeline | Multiple thread paths can process data in parallel. |
| Relational source with partitions | The Integration Service can create SQL for each reader thread. |
| File source with partitioning | Multiple file reader threads may be used depending on session and source configuration. |
| File target with multiple partitions | Writer threads can create separate target files, which can be merged if configured. |
DTM thread troubleshooting in Informatica session logs
When a session fails, the session log usually shows which DTM phase or thread reported the error. Instead of checking only the final session status, review the first serious warning or error and match it with the DTM thread type.
| Log symptom | Likely DTM area to inspect | What to check |
|---|---|---|
| Mapping fails before reading data | Mapping thread | Mapping validity, session metadata, repository access, reusable objects. |
| Pre-session command fails | Pre-session thread | Command path, operating system permissions, SQL command, stored procedure error. |
| Source query fails | Reader thread | SQL syntax, source connection, source permissions, source availability. |
| Lookup, join, rank, sorter, or aggregator fails | Transformation thread | Cache settings, cache directory, memory, transformation conditions, data types. |
| Rows are rejected at target | Writer thread | Target constraints, target data types, key violations, commit settings, reject file. |
| Post-session task fails after loading data | Post-session thread | Post-session SQL, shell command, stored procedure, notification configuration. |
Informatica DTM thread performance checks
DTM thread performance depends on source speed, transformation complexity, target load speed, memory settings, cache usage, and partitioning. A slow session should be checked in stages rather than by changing many properties at once.
- Reader-side check: Review source SQL, indexes, source filters, file read speed, and network latency.
- Transformation-side check: Review expensive transformations such as Sorter, Aggregator, Joiner, Rank, and cached Lookup transformations.
- Writer-side check: Review target constraints, commit interval, target database load, indexes, and target connection performance.
- Cache and memory check: Review cache size, cache directory space, DTM buffer settings, and whether transformations are paging to local cache files.
- Partitioning check: Review whether partitioning improves parallelism or creates extra overhead for the available hardware and source or target systems.
QA checklist for an Informatica DTM processing threads tutorial
- Does the tutorial explain that the DTM allocates session memory and divides it into buffers?
- Does it clearly state that the master thread creates and manages the other DTM processing threads?
- Does it separate mapping, pre-session, reader, transformation, writer, and post-session thread responsibilities?
- Does it explain how partitioning affects the number of reader, transformation, and writer threads?
- Does it mention cache usage for Rank, Joiner, Aggregator, Sorter, and cached Lookup transformations?
- Does it connect common session log errors to the correct DTM thread type?
FAQs on Data Transformation Manager processing threads
What are DTM processing threads in Informatica?
DTM processing threads are the threads created by the Data Transformation Manager to run a PowerCenter session. They include the master thread, mapping thread, pre-session thread, reader threads, transformation threads, writer threads, and post-session thread.
Which DTM threads move data from source to target?
Reader threads, transformation threads, and writer threads move data through the session pipeline. Reader threads extract source data, transformation threads apply mapping logic, and writer threads load transformed data into targets.
How does partitioning affect DTM processing threads?
Partitioning can increase the number of DTM processing threads. A partitioned pipeline can have multiple reader, transformation, and writer thread paths, allowing data to be processed in parallel depending on the mapping and session configuration.
What causes transformation thread cache files in Informatica?
Transformation thread cache files are used when transformations such as Rank, Joiner, Aggregator, Sorter, or cached Lookup need more data space than the configured memory cache allows. The DTM then pages extra data to local cache files.
Where should I check DTM thread errors in Informatica?
Check the session log first. The session log usually shows whether the error came from mapping compilation, a pre-session operation, a reader thread, a transformation thread, a writer thread, or a post-session operation.
TutorialKart.com