Why Build a Neural Network from Scratch?
Neural networks power everything from image recognition to language translation, yet many developers treat them as black boxes. Building one yourself is the fastest way to understand how deep learning actually works. In this guide, we will walk through creating a complete neural network using Python and TensorFlow, training it on the classic MNIST handwritten digit dataset, and evaluating its performance. By the end, you will have a working model that recognizes handwritten digits with over 97% accuracy.
Setting Up Your Environment
Before writing any code, you need to install TensorFlow. The recommended approach is to use a virtual environment to keep your dependencies isolated from other projects.
# Create and activate a virtual environment
# python -m venv tf_env
# source tf_env/bin/activate (Linux/Mac)
# tf_env\Scripts\activate (Windows)
# Install TensorFlow
# pip install tensorflow numpy matplotlib
Let us verify that TensorFlow is installed correctly and check the version.
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")
print(f"GPU available: {len(tf.config.list_physical_devices('GPU')) > 0}")
If you see a version number printed (2.16 or later recommended), you are ready to go. GPU support is optional for this tutorial since the MNIST dataset is small enough to train on a CPU in under a minute.
Loading and Exploring the MNIST Dataset
MNIST is a dataset of 70,000 grayscale images of handwritten digits (0 through 9), each 28×28 pixels. TensorFlow includes it as a built-in dataset, so loading it takes just one line.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Inspect the shapes
print(f"Training set: {x_train.shape}, Labels: {y_train.shape}")
print(f"Test set: {x_test.shape}, Labels: {y_test.shape}")
print(f"Pixel value range: {x_train.min()} to {x_train.max()}")
print(f"Label examples: {y_train[:10]}")
You should see 60,000 training images and 10,000 test images. Each image is a 28×28 NumPy array with pixel values ranging from 0 to 255. The labels are integers from 0 to 9.
# Visualize a few samples
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
ax.imshow(x_train[i], cmap='gray')
ax.set_title(f"Label: {y_train[i]}", fontsize=12)
ax.axis('off')
plt.suptitle("Sample MNIST Images", fontsize=14)
plt.tight_layout()
plt.savefig("mnist_samples.png", dpi=100)
plt.show()
Preprocessing the Data
Neural networks work best when input values are small and normalized. We will scale the pixel values from [0, 255] to [0, 1] by dividing by 255. We also need to flatten each 28×28 image into a 784-element vector for our dense network, though the Flatten layer handles that for us.
# Normalize pixel values to [0, 1]
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0
# Verify normalization
print(f"After normalization: min={x_train.min()}, max={x_train.max()}")
# Split off a validation set from training data
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
print(f"Training: {x_train.shape[0]} samples")
print(f"Validation: {x_val.shape[0]} samples")
print(f"Test: {x_test.shape[0]} samples")
Building the Neural Network
We will use TensorFlow’s Keras API to build a Sequential model. This is the simplest way to stack layers one after another. Our architecture will have an input Flatten layer, two hidden Dense layers with ReLU activation, a Dropout layer for regularization, and an output Dense layer with softmax activation for the 10 digit classes.
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout
model = Sequential([
# Input layer: flatten 28x28 images to 784-element vectors
Flatten(input_shape=(28, 28)),
# First hidden layer: 128 neurons, ReLU activation
Dense(128, activation='relu'),
# Dropout for regularization (20% of neurons randomly deactivated)
Dropout(0.2),
# Second hidden layer: 64 neurons, ReLU activation
Dense(64, activation='relu'),
# Dropout again
Dropout(0.2),
# Output layer: 10 neurons (one per digit), softmax for probabilities
Dense(10, activation='softmax')
])
# Print the model architecture
model.summary()
The model summary will show about 109,000 trainable parameters. The Flatten layer reshapes each image from (28, 28) to (784,). Each Dense layer applies a linear transformation followed by an activation function. Dropout randomly sets a fraction of inputs to zero during training, which prevents overfitting.
Compiling and Training the Model
Before training, we need to compile the model by specifying the optimizer, loss function, and metrics. For multiclass classification, sparse categorical crossentropy is the standard loss function when labels are integers.
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Train the model
history = model.fit(
x_train, y_train,
epochs=15,
batch_size=32,
validation_data=(x_val, y_val),
verbose=1
)
Training should take about 30 to 60 seconds on a modern CPU. You will see the loss decreasing and accuracy increasing with each epoch. The validation metrics help you monitor whether the model is overfitting.
# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
ax1.plot(history.history['loss'], label='Training Loss')
ax1.plot(history.history['val_loss'], label='Validation Loss')
ax1.set_title('Model Loss')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.legend()
ax2.plot(history.history['accuracy'], label='Training Accuracy')
ax2.plot(history.history['val_accuracy'], label='Validation Accuracy')
ax2.set_title('Model Accuracy')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.legend()
plt.tight_layout()
plt.savefig("training_history.png", dpi=100)
plt.show()
Evaluating and Using the Model
Now let us evaluate the trained model on the test set, which it has never seen before. This gives us an unbiased estimate of real-world performance.
# Evaluate on the test set
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")
# Make predictions on individual images
predictions = model.predict(x_test[:5])
for i in range(5):
predicted_label = np.argmax(predictions[i])
actual_label = y_test[i]
confidence = predictions[i][predicted_label] * 100
print(f"Image {i}: Predicted={predicted_label}, "
f"Actual={actual_label}, Confidence={confidence:.1f}%")
# Save the model for later use
model.save("mnist_model.keras")
print("Model saved to mnist_model.keras")
# Load and verify
loaded_model = tf.keras.models.load_model("mnist_model.keras")
loaded_loss, loaded_acc = loaded_model.evaluate(x_test, y_test, verbose=0)
print(f"Loaded model accuracy: {loaded_acc:.4f}")
You should see a test accuracy above 97%. The model correctly identifies most handwritten digits with high confidence.
Key Takeaways
Building a neural network with TensorFlow is remarkably accessible once you understand the core workflow: load data, preprocess, define the architecture, compile, train, and evaluate. The MNIST example demonstrates all the fundamental concepts including layer types, activation functions, loss functions, and regularization with Dropout. From here, you can experiment with deeper architectures, convolutional layers for better image recognition, or apply the same workflow to entirely different datasets. The skills you learned here translate directly to more complex deep learning projects.