The ELT-first movement has real merits — but logistics source data is often dirty enough that landing it raw creates more cleanup work than transforming it first.
The data engineering community has largely shifted toward ELT (Extract, Load, Transform) as the preferred pattern for modern data pipelines. The argument is compelling: land raw data in a cloud warehouse with essentially unlimited compute, transform it there using SQL with version-controlled dbt models, keep the raw data for reprocessing, and let the warehouse handle the computational heavy lifting instead of a transformation server. For many data domains, this is genuinely the better approach. For logistics data — specifically for WMS data — it's often the wrong choice, and the teams that discover this tend to discover it the hard way.
ELT works well for logistics data sources that produce clean, well-structured output with predictable schemas:
In all these cases, the source data is clean at the field level. Transformation in the warehouse is about business logic — joining, aggregating, applying business rules — not about cleaning dirty values at the field level. ELT is the appropriate pattern here.
WMS systems — particularly legacy WMS platforms and multi-client configurations — produce data that is structurally sound but semantically problematic. The issues described throughout this blog apply here: enumeration expansions, precision truncations, timestamp inconsistencies, LPN/HU naming conflicts, and company code filtering anomalies. When you land this data raw via ELT, you land all the problems into your warehouse, where they become the responsibility of your SQL transformation models.
In a pure ELT architecture, when a data quality issue surfaces in a BI report, you trace it back through the transformation layers to the raw table. If the problem originated in the source system (a semantic change, a batch update that bypassed timestamps, a precision reduction), the raw table has the wrong data. You now need to re-extract from the source and re-land in the raw layer before your transformation models can produce correct output.
In an ETL architecture with transformation before loading, the same issue is caught at the transformation layer before it reaches the destination table. The source extract is re-run, the transformation is reapplied, and the destination is updated — without requiring a raw layer re-land followed by a transformation re-run.
For WMS sources with known data quality patterns, the ETL approach reduces the blast radius of source quality issues from "warehouse raw tables are dirty, all downstream models are wrong" to "transformation job failed, destination tables not updated." The latter is a significantly easier operational situation.
ELT proponents often emphasize the value of keeping raw data for reprocessing. This is genuinely useful when the raw data is a faithful representation of the source — when you want to reprocess history because your business logic changed, not because your source data was wrong.
For WMS raw data, the "faithful representation of the source" property is less certain. If a WMS batch update bypassed timestamp tracking and you've been missing those records in your incremental loads for three weeks, your raw table doesn't have those records. The raw table isn't a faithful source-of-truth — it's a partial extract. Reprocessing it doesn't recover the missing data; it just reapplies your transformation logic to an incomplete dataset.
The practical implication: for WMS sources with batch timestamp bypass patterns, a periodic full-refresh to reconcile the raw layer is necessary regardless of whether you're using ELT or ETL. The full-refresh frequency and the mechanism for detecting gaps differ, but neither architecture eliminates the need for explicit reconciliation.
The most effective pattern for logistics data pipelines isn't a binary choice between ETL and ELT. It's a layered approach that applies transformations at the right stage for the right source type:
This hybrid approach means you're not dogmatically committed to either pattern. The transformation stage placement decision is made per-source based on the data quality characteristics of that specific source — not based on architectural preference.
One practical consideration for teams committed to dbt for transformation management: pre-transformation before loading means the transformation step isn't in dbt, and therefore isn't in the version control and lineage tracking that dbt provides. For teams where dbt represents the entire transformation layer, ETL with pre-transformation creates a split: some transformations are in the dbt project, some are in the extraction pipeline. This requires maintaining lineage documentation outside dbt for the pre-transformation steps.
This is a real operational cost. Whether it outweighs the benefit of keeping dirty WMS data out of the warehouse raw layer depends on the size of the team and the severity of the WMS quality issues. For large teams with dedicated data platform engineers, the split is manageable. For smaller teams where the dbt project is the single source of transformation truth, the complexity of maintaining external lineage documentation may tip the decision toward ELT with more aggressive raw-layer quality checks.
The ETL vs. ELT decision for logistics pipelines is genuinely source-dependent. The modern data engineering default (ELT for everything) produces good results for clean API and EDI sources. It produces more operational complexity than it saves for WMS sources with known data quality patterns. Making this call correctly per source type — rather than applying a blanket architecture — is the difference between a data platform that runs smoothly and one that requires regular manual reconciliation to keep the numbers right.
The goal isn't to be architecturally consistent. The goal is to have accurate logistics data with predictable pipeline behavior. Those goals sometimes point toward different patterns for different sources.
MLPipeLab applies per-source extraction strategies with pre-transformation for WMS sources and configurable raw-load for clean API sources. Request a demo to see how the hybrid pattern works in practice.