Feature Engineering Primer

What it is

Feature engineering is the craft of turning raw market data - prices, volume, fundamentals - into meaningful numbers that describe the current situation. A feature is one such derived number: an input you compute so that a rule, a model, or your own judgement has something concise and comparable to work with. Raw price alone is a poor feature; "the percentage distance from the 50-day moving average" or "today's volume relative to its 20-day average" are good features, because each compresses messy data into a single interpretable quantity.

Every indicator you have met is a feature. RSI is a feature engineered from price changes. ATR is a feature engineered from the daily range. VWAP, the P/E ratio, the 200-day return - all are features, each designed to make some aspect of the market measurable and comparable across time and across instruments. Seeing indicators this way is liberating: instead of memorising dozens of named tools, you understand them as members of a few families - momentum, volatility, value, participation - each a different recipe for turning raw data into one comparable number. Once you think in features, you can build your own rather than reaching only for the off-the-shelf names.

How it works

A good feature does three jobs: it captures something real, it is comparable, and it is honest about time.

Capturing something real. The feature should describe a property that plausibly relates to future behaviour - momentum, value, volatility, trend strength - not an arbitrary squiggle.
Comparability through normalisation. Raw numbers from different instruments are not comparable. A $2 move means nothing without context: it is huge for a $20 stock and trivial for a $2,000 one. Normalisation rescales a feature so values can be compared across instruments and across time. Common approaches are expressing a move as a percentage, dividing by ATR so it is measured in volatility units, or computing a z-score (how many standard deviations a value sits from its own recent mean). A z-score of +2 means the same thing - "two standard deviations stretched" - whether the asset trades at $20 or $2,000.
Honesty about time. This is the deep one, covered next.

Normalisation is what lets a single rule work across a whole universe. "Buy when the 10-day return is more than two standard deviations below its one-year mean" applies identically to a cheap stock and an expensive one, because the z-score has stripped out the units. Without normalisation, every instrument would need its own hand-tuned thresholds, and comparisons would be meaningless.

There is a second, subtler reason normalisation matters: it lets you combine features that were measured in different units. You cannot meaningfully add a price distance in dollars to an RSI reading on a 0-100 scale to a volume figure in millions of shares - they live in incompatible units. Convert each to a z-score and they suddenly share one language, "standard deviations from normal", in which they can be compared, ranked, or averaged. This is the quiet engine behind almost every multi-signal system: normalise everything to a common scale first, then reason about the normalised numbers. A feature that has not been normalised is not just hard to interpret in isolation; it is impossible to integrate with anything else.

How to use it

A practical checklist for building a feature:

Define exactly what it measures, in words, before you compute it. "Distance of price above its rising 50-day average, in ATR units."
Normalise it so it is comparable across instruments - percentage, ATR units, or z-score.
Confirm it uses only information available at the moment it would be acted on. This is the rule that protects you from the single most damaging mistake in the whole field.
Keep the set small. A handful of strong, distinct features beats dozens of overlapping ones, which invite overfitting.

To see the checklist in motion, build one feature end to end. Goal: a feature that flags a stock stretched far below its recent norm - a candidate for mean reversion. Step one, define it in words: "how many standard deviations is today's close below its 20-day average close." Step two, compute it as a z-score using only the trailing 20 days up to and including today, never future days. Step three, normalise - the z-score is already unit-free, so a reading of −2 means the same stretch on a $20 stock and a $2,000 one. Step four, check the timing: at the moment you would act (say, the open of the next day), every input - the 20 closes, their mean, their standard deviation - existed before that moment, so there is no leakage. The result is one clean number, comparable across the universe, honest about time, that you could drop straight into a rule like "consider a long when the feature reads below −2 and the 200-day trend is up." Notice how each checklist step closed off a specific failure: vagueness, incomparability, and leakage.

Strengths & limits

The strength of good feature engineering is that it turns a chaotic market into a small set of interpretable, comparable numbers that a rule or a person can act on consistently. It is the bridge between raw data and decisions.

The limits are dominated by one catastrophic pitfall: lookahead leakage (data leakage). Leakage is when a feature accidentally includes information that would not have been available at the time you act on it - you let the future leak into the past. The classic example is using the day's closing price to decide a trade you are supposed to enter during that same day: at the moment of the decision, the close does not yet exist. Other forms are subtler - normalising a feature using the full history's mean and standard deviation (which includes future data) when backtesting, or using a fundamental figure on a date before it was actually reported.

Leakage is so dangerous because it produces results that look spectacular and are completely fake. A backtest contaminated by leakage shows an effortless, smooth equity curve, because the strategy is quietly peeking at answers it could never have known in real time. It then fails instantly with real money, where the future is genuinely unavailable. The discipline that prevents it is simple to state and hard to enforce: at every point in a backtest, a feature may use only data that existed at or before that point. Compute rolling statistics with trailing windows only, lag fundamentals to their true report date, and never let a feature touch a value from its own future. Respect that single rule and most of feature engineering's worst failures disappear.

A practical way to catch leakage is to imagine a strict referee standing at each historical moment, allowed to hand your feature only the data a real trader could have held at that instant - never tomorrow's close, never a revised figure published later, never a statistic computed over the full dataset including the future. If you cannot defend every input to that referee, the feature leaks. The reason the rule is so easy to violate is that the most convenient way to compute a feature in a spreadsheet or notebook - normalise against the whole column, join a fundamental on its fiscal-period date rather than its release date - is almost always the leaking way. Honesty about time costs effort precisely because the dishonest path is the path of least resistance, and the dishonest path is the one that produces the beautiful, doomed backtest.

The other limit is more mundane but still costly: feature proliferation. Because features are easy to invent, it is tempting to build dozens and feed them all into a rule. Most will be noise, many will be near-duplicates of each other, and the sheer number gives you so many knobs to turn that you can fit almost any history - which is overfitting by another name. A disciplined practitioner keeps a small set of features that each measure something genuinely distinct and can each be justified in one sentence. Fewer, stronger, well-understood features beat a sprawling pile of weak ones every time.

Key takeaway: A feature is any number you derive from raw data to describe a market; normalisation (percentage, ATR units, z-score) makes features comparable across instruments - but lookahead leakage, letting future information into a feature, is the pitfall that makes a backtest look brilliant and then fail with real money.