← Home
ArxCafe / ML Engineering / Layer 0 / Part 2 / Chapter 0.4
Book 0 · Chapter 0.4
Book 0 — Chapter 0.4

Scalars, Vectors, and Feature Spaces

Concept page: once reality is mapped into numbers, the model can only reason inside that geometry. Feature design is the first irreversible decision.
0.4.1 Why everything becomes numbers

Machine learning models cannot operate on things — only on numbers. Users, images, sentences, transactions, sensors, documents — all must be converted into numerical form before any learning can occur.

This conversion is not incidental. It is the first irreversible design decision in any ML system.

Once reality is mapped into numbers, the model can only reason within that numerical representation.

0.4.2 Scalars: single measurements

A scalar is a single real number:

x ∈ R

Examples:

  • age = 42
  • temperature = 18.7
  • account_balance = 1520.35

Scalars are the simplest features and often the most dangerous:

  • they imply linear ordering
  • they imply magnitude comparisons
  • they invite extrapolation
Engineering implication

If a scalar’s meaning is non-linear (e.g., risk scores, IDs), treating it as a scalar can mislead the model.

0.4.3 Vectors: describing an entity with multiple attributes

A vector is an ordered collection of scalars:

x = (x_1, x_2, …, x_d) ∈ R^d

This is the canonical representation of a data point in ML.

Examples:

  • user = [age, country_code, avg_session_time, purchase_count]
  • house = [square_feet, bedrooms, distance_to_city, year_built]

Each component defines one axis in feature space.

Key insight

A data point is not “an object” — it is a location in space.

0.4.4 Feature space: the universe the model lives in

The feature space is the set of all possible vectors your model might see.

  • 3 numeric features → 3D space
  • 100 features → 100D space
  • 10,000 features → 10,000D space

Models do not see:

  • semantics
  • causality
  • meaning

They see:

  • distances
  • angles
  • projections
  • regions

Once you define the feature space, you define what the model can learn.

0.4.5 Coordinate systems and meaning

Choosing features is choosing a coordinate system for reality.

Two representations of the same object can lead to radically different learning outcomes.

Example:

  • raw timestamps vs cyclical encoding (sin/cos of hour)
  • zip code as number vs one-hot encoding
  • text as word counts vs embeddings

Mathematically, these are different spaces — even if they describe the same thing.

0.4.6 Categorical data and the danger of fake geometry

Many real-world attributes are categorical:

  • country
  • device type
  • product ID

They have no natural ordering.

If you encode them as integers:

country = 1, country = 2, country = 3

you accidentally introduce:

  • distance
  • magnitude
  • ordering

One-hot encoding fixes this by expanding into a higher-dimensional space:

  • each category becomes its own axis
  • distance reflects equality, not magnitude

Trade-off:

  • correctness vs dimensionality explosion
0.4.7 Binary features and indicator variables

Binary features (0/1) are everywhere:

  • clicked / not clicked
  • fraud / not fraud
  • active / inactive

They:

  • carve space into regions
  • create sharp decision boundaries
  • interact strongly with linear models

In many systems, most signal comes from binary indicators, not continuous values.

0.4.8 Continuous features and scaling

Continuous features introduce geometry:

  • distance
  • direction
  • scale

If one feature ranges from 0–1 and another from 0–1,000,000:

  • the larger dominates dot products
  • gradients become ill-conditioned
  • training becomes unstable

This is why:

  • normalization
  • standardization
  • log transforms

exist. They are geometric corrections, not cosmetic preprocessing.

0.4.9 High-dimensional spaces and intuition failure

As dimensionality increases:

  • distances concentrate
  • nearest neighbors become less meaningful
  • volume grows exponentially

This is the curse of dimensionality.

Engineering consequences:

  • simple distance-based methods degrade
  • feature selection becomes critical
  • regularization becomes mandatory

High-dimensional space behaves nothing like 2D or 3D intuition.

0.4.10 Sparsity: when most dimensions are zero

Many feature spaces are sparse:

  • text (bag-of-words)
  • categorical one-hot encodings
  • recommender systems

Sparsity is not a bug — it is a structure.

Benefits:

  • efficient storage
  • fast dot products
  • interpretable signals

But it also:

  • increases dimensionality
  • complicates similarity
  • demands specialized algorithms
0.4.11 Embeddings: learned feature spaces

Embeddings map discrete objects into continuous vector spaces:

object → R^k

Properties:

  • similar objects are close
  • distances become meaningful
  • dimensionality is controlled

Embeddings learn the geometry instead of you defining it manually.

This bridges:

  • feature engineering
  • deep learning
  • retrieval systems
0.4.12 Linear separability and model limits

In feature space, many models are simply: “Can I draw a surface that separates these points?”

Linear models:

  • draw flat surfaces (hyperplanes)
  • rely heavily on feature design

If data is not linearly separable in your space:

  • no amount of training will fix it
  • you must change representation or model family
0.4.13 Feature interactions as geometry

Interactions (e.g., age × income) correspond to:

  • bending space
  • introducing new dimensions
  • changing separability

Polynomial features and neural networks both:

  • enrich feature space
  • increase expressiveness
  • increase overfitting risk

Representation power always trades off with stability.

0.4.14 Engineering mindset: features define reality

A model does not fail because it “missed something.” It fails because that information was never represented — or was represented in a misleading way.

Feature design is ontological:

  • it defines what exists in the model’s universe
0.4.15 Chapter takeaway

Machine learning models do not learn about the world. They learn about points in a feature space you designed.

If the geometry is wrong:

  • optimization will still succeed
  • metrics may look good
  • deployment will fail
Readiness Check

You should now be able to:

  • Explain why all ML inputs are vectors
  • Reason about feature spaces geometrically
  • Identify fake structure introduced by bad encoding
  • Explain why scaling affects training
  • Understand why representation often matters more than algorithm