Chapter 0.4 — Scalars, Vectors, and Feature Spaces

0.4.1 Why everything becomes numbers

Machine learning models cannot operate on things — only on numbers. Users, images, sentences, transactions, sensors, documents — all must be converted into numerical form before any learning can occur.

This conversion is not incidental. It is the first irreversible design decision in any ML system.

Once reality is mapped into numbers, the model can only reason within that numerical representation.

0.4.2 Scalars: single measurements

A scalar is a single real number:

x \in R

Examples:

age = 42
temperature = 18.7
account_balance = 1520.35

Scalars are the simplest features and often the most dangerous:

they imply linear ordering
they imply magnitude comparisons
they invite extrapolation

Engineering implication

If a scalar’s meaning is non-linear (e.g., risk scores, IDs), treating it as a scalar can mislead the model.

0.4.3 Vectors: describing an entity with multiple attributes

A vector is an ordered collection of scalars:

x = (x_1, x_2, \dots, x_d) \in R^d

This is the canonical representation of a data point in ML.

Examples:

user = [age, country_code, avg_session_time, purchase_count]
house = [square_feet, bedrooms, distance_to_city, year_built]

Each component defines one axis in feature space.

Key insight

A data point is not “an object” — it is a location in space.

0.4.4 Feature space: the universe the model lives in

The feature space is the set of all possible vectors your model might see.

3 numeric features → 3D space
100 features → 100D space
10,000 features → 10,000D space

Models do not see:

semantics
causality
meaning

They see:

distances
angles
projections
regions

Once you define the feature space, you define what the model can learn.

0.4.5 Coordinate systems and meaning

Choosing features is choosing a coordinate system for reality.

Two representations of the same object can lead to radically different learning outcomes.

Example:

raw timestamps vs cyclical encoding (sin/cos of hour)
zip code as number vs one-hot encoding
text as word counts vs embeddings

Mathematically, these are different spaces — even if they describe the same thing.

0.4.6 Categorical data and the danger of fake geometry

Many real-world attributes are categorical:

country
device type
product ID

They have no natural ordering.

If you encode them as integers:

country = 1, country = 2, country = 3

you accidentally introduce:

distance
magnitude
ordering

One-hot encoding fixes this by expanding into a higher-dimensional space:

each category becomes its own axis
distance reflects equality, not magnitude

Trade-off:

correctness vs dimensionality explosion

0.4.7 Binary features and indicator variables

Binary features (0/1) are everywhere:

clicked / not clicked
fraud / not fraud
active / inactive

They:

carve space into regions
create sharp decision boundaries
interact strongly with linear models

In many systems, most signal comes from binary indicators, not continuous values.

0.4.8 Continuous features and scaling

Continuous features introduce geometry:

distance
direction
scale

If one feature ranges from 0–1 and another from 0–1,000,000:

the larger dominates dot products
gradients become ill-conditioned
training becomes unstable

This is why:

normalization
standardization
log transforms

exist. They are geometric corrections, not cosmetic preprocessing.

0.4.9 High-dimensional spaces and intuition failure

As dimensionality increases:

distances concentrate
nearest neighbors become less meaningful
volume grows exponentially

This is the curse of dimensionality.

Engineering consequences:

simple distance-based methods degrade
feature selection becomes critical
regularization becomes mandatory

High-dimensional space behaves nothing like 2D or 3D intuition.

0.4.10 Sparsity: when most dimensions are zero

Many feature spaces are sparse:

text (bag-of-words)
categorical one-hot encodings
recommender systems

Sparsity is not a bug — it is a structure.

Benefits:

efficient storage
fast dot products
interpretable signals

But it also:

increases dimensionality
complicates similarity
demands specialized algorithms

0.4.11 Embeddings: learned feature spaces

Embeddings map discrete objects into continuous vector spaces:

object \to R^k

Properties:

similar objects are close
distances become meaningful
dimensionality is controlled

Embeddings learn the geometry instead of you defining it manually.

This bridges:

feature engineering
deep learning
retrieval systems

0.4.12 Linear separability and model limits

In feature space, many models are simply: “Can I draw a surface that separates these points?”

Linear models:

draw flat surfaces (hyperplanes)
rely heavily on feature design

If data is not linearly separable in your space:

no amount of training will fix it
you must change representation or model family

0.4.13 Feature interactions as geometry

Interactions (e.g., age × income) correspond to:

bending space
introducing new dimensions
changing separability

Polynomial features and neural networks both:

enrich feature space
increase expressiveness
increase overfitting risk

Representation power always trades off with stability.

0.4.14 Engineering mindset: features define reality

A model does not fail because it “missed something.” It fails because that information was never represented — or was represented in a misleading way.

Feature design is ontological:

it defines what exists in the model’s universe

0.4.15 Chapter takeaway

Machine learning models do not learn about the world. They learn about points in a feature space you designed.

If the geometry is wrong:

optimization will still succeed
metrics may look good
deployment will fail

Readiness Check

You should now be able to:

Explain why all ML inputs are vectors
Reason about feature spaces geometrically
Identify fake structure introduced by bad encoding
Explain why scaling affects training
Understand why representation often matters more than algorithm