Beyond Backpropagation: Smarter Neural Networks for Smart Manufacturing

(Code and results of experiments related to the article submitted to ISM-2025: “International Conference on Industry of the Future and Smart Manufacturing”)

 

ABSTRACT

As neural networks (NNs) become integral to advanced applications in smart manufacturing, the demand for models that are both accurate and robust continues to grow. A persistent challenge in NN training lies in avoiding local minima, which can hinder the model’s ability to minimize the loss function effectively—both in fitting training data and generalizing to unseen test data, thereby achieving globally optimal performance. To address this, we propose an extension to traditional backpropagation, incorporating a self-adaptive mechanism that encourages exploration of underutilized regions of the optimization landscape. This method adds an auxiliary objective to the training process, complementing gradient-based exploitation with an exploration component that dynamically adjusts the network’s internal state. We provide a mathematical formulation of the algorithm and conduct comparative experiments showing that our approach achieves lower training loss and superior accuracy. We analyze its connections to existing methods such as momentum and entropy-based regularization, emphasizing its unique contributions. Finally, we discuss the implications for the industry of the future, where NNs must perform reliably under dynamic, real-world conditions. By enabling smarter, self-critical models, this approach advances the development of more reliable and adaptive NNs for smart manufacturing.

 

 

CODE AND EXPERIMENTS

 

Part I (Restricted Implementation)

(Simplified option when homogeneity gradient depends only on current change without history part)

 

1. HOMOGENEITY NN TRAINING (FINAL CODE) [manual setup for learning rate, labda depends on ] –

[A-A] – MNIST DATASET

 

=========================================

# BEGINNING OF THE CODE

 

import torch

import torch.nn as nn

import torch.optim as optim

from torch.utils.data import DataLoader, random_split

from torchvision import datasets, transforms

import numpy as np

import matplotlib.pyplot as plt

import time

import math

 

# Device configuration

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

 

# Define the neural network model

class Net(nn.Module):

    def __init__(self):

        super(Net, self).__init__()

        self.fc1 = nn.Linear(28 * 28, 128)

        self.relu = nn.ReLU()

        self.fc2 = nn.Linear(128, 10)

 

    def forward(self, x):

        x = x.view(-1, 28 * 28)

        x = self.fc1(x)

        x = self.relu(x)

        x = self.fc2(x)

        return x

 

# Function to calculate Absolute Difference Similarity (ADS)

def calculate_similarity(status, average, epsilon=1e-8):

    num = torch.sum(torch.abs(status - average))

    den = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

    similarity = 1 - (num / den)

    return similarity

 

# Function to calculate partial derivative of Homogeneity

def calculate_partial_derivative(status, average, homogeneity_lambda, epsilon=1e-8):

    N = torch.sum(torch.abs(status - average))

    D = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

 

    condition1 = torch.logical_or(torch.logical_and(status > average, average > 0),

                                  torch.logical_and(status < average, average < 0))

    condition2 = torch.logical_or(torch.logical_and(status > 0, average < 0),

                                  torch.logical_and(status < 0, average > 0))

    condition3 = torch.logical_and(status == 0, average > 0)

    condition4 = torch.logical_and(status == 0, average < 0)

    condition5 = status == average

 

    partial_derivative = torch.zeros_like(status)

 

    partial_derivative[condition1] = (1 - homogeneity_lambda) * (1 / D**2) * (D - N)

    partial_derivative[condition2] = (1 - homogeneity_lambda) * (1 / D**2) * (D + N)

    partial_derivative[condition3] = -(1 - homogeneity_lambda) / D

    partial_derivative[condition4] = (1 - homogeneity_lambda) / D

    partial_derivative[condition5] = 0

 

    remaining_indices = torch.logical_not(torch.logical_or(torch.logical_or(torch.logical_or(condition1, condition2),

                                                                           torch.logical_or(condition3, condition4)),

                                                           condition5))

    partial_derivative[remaining_indices] = (1 - homogeneity_lambda) * (1 / D**2) * (

        D * torch.sign(status[remaining_indices] - average[remaining_indices]) -

        N * torch.sign(status[remaining_indices])

    )

 

    return partial_derivative

 

# Function to perform Homogeneity-driven weight update

def weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average, epsilon=1e-8):

    partial_derivative = calculate_partial_derivative(status, average, homogeneity_lambda, epsilon)

    delta = -homogeneity_learning_rate * partial_derivative

    return delta

 

# Function to train the model with Homogeneity-driven update

def train_homogeneity_driven(model, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    optimizer_homogeneity = optim.Adam(model.parameters(), lr=homogeneity_learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

    homogeneity_values = []

 

    homogeneity = 1.0

    total_start_time = time.time()

    iteration_counter = 1

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

 

            # --- Backpropagation Update ---

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

           

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # --- Homogeneity-driven Update ---

            status = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            weights_before_update = status.clone()

 

 #           print("Model weights before homogeneity update (first 5 values of fc1.weight):")

 #           print(model.fc1.weight.data[:5])

 

            similarity = calculate_similarity(status, average)

            homogeneity_lambda = (iteration_counter - 1) / total_iterations

            homogeneity = (1 - homogeneity_lambda) * similarity + homogeneity_lambda * homogeneity

            homogeneity_values.append(homogeneity.item())

 

            delta = weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average)

 

            current_index = 0

            for param in model.parameters():

                param_size = param.nelement()

                update_value = delta[current_index: current_index + param_size].view_as(param.data)

                param.grad = -update_value

                current_index += param_size

 

            optimizer_homogeneity.step()

            optimizer_homogeneity.zero_grad()

 

   #         print("\nModel weights after homogeneity update (first 5 values of fc1.weight):")

   #        print(model.fc1.weight.data[:5])

 

            weights_after_update = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            distance = torch.norm(weights_before_update - weights_after_update)

   #         print(f"Distance between weights before and after homogeneity update: {distance.item()}\n")

 

            average = (status + (epoch * len(train_loader) + batch_idx) * average) / (

                        epoch * len(train_loader) + batch_idx + 1)

 

            optimizer.zero_grad()

 

   #         print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

   #               f"Backpropagation Update: {total_bp_update:.4f}, Homogeneity Update: {torch.sum(torch.abs(delta)).item():.4f}, "

   #               f"homogeneity_lambda: {homogeneity_lambda:.4f}, similarity: {similarity:.4f}, homogeneity: {homogeneity:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies, homogeneity_values

 

# Function to train the model with traditional backpropagation

def train_traditional(model, train_loader, val_loader, num_epochs, learning_rate):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

 

    total_start_time = time.time()

    iteration_counter = 1  # Initialize iteration_counter here

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

 

            # Calculate total_bp_update for each iteration

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

            #       f"Backpropagation Update: {total_bp_update:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies

 

# Function to calculate accuracy

def calculate_accuracy(model, data_loader):

    model.eval()

    correct = 0

    total = 0

 

    with torch.no_grad():

        for images, labels in data_loader:

            images, labels = images.to(device), labels.to(device)

            outputs = model(images)

            _, predicted = torch.max(outputs.data, 1)

 

            total += labels.size(0)

            correct += (predicted == labels).sum().item()

 

    return 100 * correct / total

 

# !!!!!!!!!!!! Hyperparameters !!!!!!!!!!!!!!!!!!!!!

num_epochs = 8

learning_rate = 0.1

batch_size = 420

homogeneity_learning_rate = 0.1

 

# Load and split MNIST dataset

transform = transforms.ToTensor()

full_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

 

training_samples_number = int(0.7 * len(full_dataset))

total_iterations = math.ceil(training_samples_number / batch_size) * num_epochs

 

print("Hyperparameters and Calculated Values:")

print(f"  num_epochs: {num_epochs}")

print(f"  learning_rate: {learning_rate}")

print(f"  homogeneity_learning_rate: {homogeneity_learning_rate}")

print(f"  batch_size: {batch_size}")

print(f"  training_samples_number: {training_samples_number}")

print(f"  total_iterations: {total_iterations}")

print("\n")

 

train_size = int(0.7 * len(full_dataset))

val_size = int(0.15 * len(full_dataset))

test_size = len(full_dataset) - train_size - val_size

train_dataset, val_dataset, _ = random_split(full_dataset, [train_size, val_size, test_size])

 

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

 

# Initialize models

model_homogeneity = Net().to(device)

model_traditional = Net().to(device)

 

average_model = Net().to(device)

average = torch.cat([p.data.flatten() for p in average_model.parameters()]).detach()

 

# Train the models

print("Training with Homogeneity-driven update:")

results_homogeneity = train_homogeneity_driven(model_homogeneity, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average)

 

print("\nTraining with traditional backpropagation:")

results_traditional = train_traditional(model_traditional, train_loader, val_loader, num_epochs, learning_rate)

 

# Evaluate and plot results

train_losses_h, val_losses_h, train_accuracies_h, val_accuracies_h, homogeneity_values_h = results_homogeneity

train_losses_t, val_losses_t, train_accuracies_t, val_accuracies_t = results_traditional

 

print(f"\nFinal Test Accuracy (Homogeneity-driven): {calculate_accuracy(model_homogeneity, test_loader):.2f}%")

print(f"Final Test Accuracy (Traditional): {calculate_accuracy(model_traditional, test_loader):.2f}%")

 

plt.figure(figsize=(10, 5))

plt.plot(train_losses_h, label='Homogeneity-Driven Train Loss')

plt.plot(val_losses_h, label='Homogeneity-Driven Validation Loss')

plt.plot(train_losses_t, label='Traditional Train Loss')

plt.plot(val_losses_t, label='Traditional Validation Loss')

plt.title('Training and Validation Loss')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(train_accuracies_h, label='Homogeneity-Driven Train Accuracy')

plt.plot(val_accuracies_h, label='Homogeneity-Driven Validation Accuracy')

plt.plot(train_accuracies_t, label='Traditional Train Accuracy')

plt.plot(val_accuracies_t, label='Traditional Validation Accuracy')

plt.title('Training and Validation Accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(homogeneity_values_h, label='Homogeneity')

plt.title('Homogeneity Values over Iterations')

plt.xlabel('Iteration')

plt.ylabel('Homogeneity')

plt.legend()

plt.show()

 

# END OF THE CODE

=========================================

 

 

Samples of run:

-------------------------------------------------------

Sample [A-A-MNIST] 1:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.0005

  batch_size: 140

  training_samples_number: 42000

  total_iterations: 2400

 

Training with Homogeneity-driven update:

Total training time: 82.51 seconds

 

Training with traditional backpropagation:

Total training time: 58.64 seconds

 

Final Test Accuracy (Homogeneity-driven): 97.12%

Final Test Accuracy (Traditional): 97.05%

 

-------------------------------------------------------

Sample [A-A-MNIST] 2:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.001

  batch_size: 280

  training_samples_number: 42000

  total_iterations: 1200

 

Training with Homogeneity-driven update:

Total training time: 67.44 seconds

 

Training with traditional backpropagation:

Total training time: 53.19 seconds

 

Final Test Accuracy (Homogeneity-driven): 97.18%

Final Test Accuracy (Traditional): 97.18%

 

-------------------------------------------------------

Sample [A-A-MNIST] 3:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.001

  batch_size: 420

  training_samples_number: 42000

  total_iterations: 800

 

Training with Homogeneity-driven update:

Total training time: 61.46 seconds

 

Training with traditional backpropagation:

Total training time: 51.95 seconds

 

Final Test Accuracy (Homogeneity-driven): 97.53%

Final Test Accuracy (Traditional): 97.49%

 

-------------------------------------------------------

Sample [A-A-MNIST] 4:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 1

  learning_rate: 0.01

  homogeneity_learning_rate: 0.02

  batch_size: 42000

  training_samples_number: 42000

  total_iterations: 1

 

Training with Homogeneity-driven update:

Total training time: 7.19 seconds

 

Training with traditional backpropagation:

Total training time: 6.57 seconds

 

Final Test Accuracy (Homogeneity-driven): 64.99%

Final Test Accuracy (Traditional): 58.69%

-------------------------------------------------------


 

2. HOMOGENEITY NN TRAINING (Option with manual setting of learning rate and lambda depends on current iteration ) – [A-B] – MNIST Dataset

 

=========================================

# BEGINNING OF THE CODE

 

import torch

import torch.nn as nn

import torch.optim as optim

from torch.utils.data import DataLoader, random_split

from torchvision import datasets, transforms

import numpy as np

import matplotlib.pyplot as plt

import time

import math

 

# Device configuration

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

 

# Define the neural network model

class Net(nn.Module):

    def __init__(self):

        super(Net, self).__init__()

        self.fc1 = nn.Linear(28 * 28, 128)

        self.relu = nn.ReLU()

        self.fc2 = nn.Linear(128, 10)

 

    def forward(self, x):

        x = x.view(-1, 28 * 28)

        x = self.fc1(x)

        x = self.relu(x)

        x = self.fc2(x)

        return x

 

# Function to calculate Absolute Difference Similarity (ADS)

def calculate_similarity(status, average, epsilon=1e-8):

    num = torch.sum(torch.abs(status - average))

    den = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

    similarity = 1 - (num / den)

    return similarity

 

# Function to calculate partial derivative of Homogeneity

def calculate_partial_derivative(status, average, homogeneity_lambda, epsilon=1e-8):

    N = torch.sum(torch.abs(status - average))

    D = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

 

    condition1 = torch.logical_or(torch.logical_and(status > average, average > 0),

                                  torch.logical_and(status < average, average < 0))

    condition2 = torch.logical_or(torch.logical_and(status > 0, average < 0),

                                  torch.logical_and(status < 0, average > 0))

    condition3 = torch.logical_and(status == 0, average > 0)

    condition4 = torch.logical_and(status == 0, average < 0)

    condition5 = status == average

 

    partial_derivative = torch.zeros_like(status)

 

    partial_derivative[condition1] = (1 - homogeneity_lambda) * (1 / D**2) * (D - N)

    partial_derivative[condition2] = (1 - homogeneity_lambda) * (1 / D**2) * (D + N)

    partial_derivative[condition3] = -(1 - homogeneity_lambda) / D

    partial_derivative[condition4] = (1 - homogeneity_lambda) / D

    partial_derivative[condition5] = 0

 

    remaining_indices = torch.logical_not(torch.logical_or(torch.logical_or(torch.logical_or(condition1, condition2),

                                                                           torch.logical_or(condition3, condition4)),

                                                           condition5))

    partial_derivative[remaining_indices] = (1 - homogeneity_lambda) * (1 / D**2) * (

        D * torch.sign(status[remaining_indices] - average[remaining_indices]) -

        N * torch.sign(status[remaining_indices])

    )

 

    return partial_derivative

 

# Function to perform Homogeneity-driven weight update

def weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average, epsilon=1e-8):

    partial_derivative = calculate_partial_derivative(status, average, homogeneity_lambda, epsilon)

    delta = -homogeneity_learning_rate * partial_derivative

    return delta

 

# Function to train the model with Homogeneity-driven update

def train_homogeneity_driven(model, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    optimizer_homogeneity = optim.Adam(model.parameters(), lr=homogeneity_learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

    homogeneity_values = []

 

    homogeneity = 1.0

    total_start_time = time.time()

    iteration_counter = 1

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

 

            # --- Backpropagation Update ---

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

           

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # --- Homogeneity-driven Update ---

            status = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            weights_before_update = status.clone()

 

 #           print("Model weights before homogeneity update (first 5 values of fc1.weight):")

 #           print(model.fc1.weight.data[:5])

 

            similarity = calculate_similarity(status, average)

            homogeneity_lambda = (iteration_counter - 1) / iteration_counter

            homogeneity = (1 - homogeneity_lambda) * similarity + homogeneity_lambda * homogeneity

            homogeneity_values.append(homogeneity.item())

 

            delta = weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average)

 

            current_index = 0

            for param in model.parameters():

                param_size = param.nelement()

                update_value = delta[current_index: current_index + param_size].view_as(param.data)

                param.grad = -update_value

                current_index += param_size

 

            optimizer_homogeneity.step()

            optimizer_homogeneity.zero_grad()

 

   #         print("\nModel weights after homogeneity update (first 5 values of fc1.weight):")

   #        print(model.fc1.weight.data[:5])

 

            weights_after_update = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            distance = torch.norm(weights_before_update - weights_after_update)

   #         print(f"Distance between weights before and after homogeneity update: {distance.item()}\n")

 

            average = (status + (epoch * len(train_loader) + batch_idx) * average) / (

                        epoch * len(train_loader) + batch_idx + 1)

 

            optimizer.zero_grad()

 

   #         print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

   #               f"Backpropagation Update: {total_bp_update:.4f}, Homogeneity Update: {torch.sum(torch.abs(delta)).item():.4f}, "

   #               f"homogeneity_lambda: {homogeneity_lambda:.4f}, similarity: {similarity:.4f}, homogeneity: {homogeneity:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies, homogeneity_values

 

# Function to train the model with traditional backpropagation

def train_traditional(model, train_loader, val_loader, num_epochs, learning_rate):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

 

    total_start_time = time.time()

    iteration_counter = 1  # Initialize iteration_counter here

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

 

            # Calculate total_bp_update for each iteration

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

            #       f"Backpropagation Update: {total_bp_update:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies

 

# Function to calculate accuracy

def calculate_accuracy(model, data_loader):

    model.eval()

    correct = 0

    total = 0

 

    with torch.no_grad():

        for images, labels in data_loader:

            images, labels = images.to(device), labels.to(device)

            outputs = model(images)

            _, predicted = torch.max(outputs.data, 1)

 

            total += labels.size(0)

            correct += (predicted == labels).sum().item()

 

    return 100 * correct / total

 

# !!!!!!!!!!!! Hyperparameters !!!!!!!!!!!!!!!!!!!!!

num_epochs = 8

learning_rate = 0.1

batch_size = 420

homogeneity_learning_rate = 0.1

 

# Load and split MNIST dataset

transform = transforms.ToTensor()

full_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

 

training_samples_number = int(0.7 * len(full_dataset))

total_iterations = math.ceil(training_samples_number / batch_size) * num_epochs

 

print("Hyperparameters and Calculated Values:")

print(f"  num_epochs: {num_epochs}")

print(f"  learning_rate: {learning_rate}")

print(f"  homogeneity_learning_rate: {homogeneity_learning_rate}")

print(f"  batch_size: {batch_size}")

print(f"  training_samples_number: {training_samples_number}")

print(f"  total_iterations: {total_iterations}")

print("\n")

 

train_size = int(0.7 * len(full_dataset))

val_size = int(0.15 * len(full_dataset))

test_size = len(full_dataset) - train_size - val_size

train_dataset, val_dataset, _ = random_split(full_dataset, [train_size, val_size, test_size])

 

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

 

# Initialize models

model_homogeneity = Net().to(device)

model_traditional = Net().to(device)

 

average_model = Net().to(device)

average = torch.cat([p.data.flatten() for p in average_model.parameters()]).detach()

 

# Train the models

print("Training with Homogeneity-driven update:")

results_homogeneity = train_homogeneity_driven(model_homogeneity, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average)

 

print("\nTraining with traditional backpropagation:")

results_traditional = train_traditional(model_traditional, train_loader, val_loader, num_epochs, learning_rate)

 

# Evaluate and plot results

train_losses_h, val_losses_h, train_accuracies_h, val_accuracies_h, homogeneity_values_h = results_homogeneity

train_losses_t, val_losses_t, train_accuracies_t, val_accuracies_t = results_traditional

 

print(f"\nFinal Test Accuracy (Homogeneity-driven): {calculate_accuracy(model_homogeneity, test_loader):.2f}%")

print(f"Final Test Accuracy (Traditional): {calculate_accuracy(model_traditional, test_loader):.2f}%")

 

plt.figure(figsize=(10, 5))

plt.plot(train_losses_h, label='Homogeneity-Driven Train Loss')

plt.plot(val_losses_h, label='Homogeneity-Driven Validation Loss')

plt.plot(train_losses_t, label='Traditional Train Loss')

plt.plot(val_losses_t, label='Traditional Validation Loss')

plt.title('Training and Validation Loss')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(train_accuracies_h, label='Homogeneity-Driven Train Accuracy')

plt.plot(val_accuracies_h, label='Homogeneity-Driven Validation Accuracy')

plt.plot(train_accuracies_t, label='Traditional Train Accuracy')

plt.plot(val_accuracies_t, label='Traditional Validation Accuracy')

plt.title('Training and Validation Accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(homogeneity_values_h, label='Homogeneity')

plt.title('Homogeneity Values over Iterations')

plt.xlabel('Iteration')

plt.ylabel('Homogeneity')

plt.legend()

plt.show()

 

# END OF THE CODE

=========================================

 

 

Samples of run:

-------------------------------------------------------

Sample [A-B-MNIST] 1:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.1

  homogeneity_learning_rate: 0.1

  batch_size: 420

  training_samples_number: 42000

  total_iterations: 800

 

Training with Homogeneity-driven update:

Total training time: 59.27 seconds

 

Training with traditional backpropagation:

Total training time: 51.43 seconds

 

Final Test Accuracy (Homogeneity-driven): 92.24%

Final Test Accuracy (Traditional): 90.14%

 

-------------------------------------------------------

Sample [A-B-MNIST] 2:

-------------------------------------------------------

Hyperparameters and Calculated Values:
  num_epochs: 18
  learning_rate: 0.001
  homogeneity_learning_rate: 0.001
  batch_size: 4200
  training_samples_number: 42000
  total_iterations: 180
 
Training with Homogeneity-driven update:
Total training time: 119.41 seconds
 
Training with traditional backpropagation:
Total training time: 115.07 seconds
 
Final Test Accuracy (Homogeneity-driven): 93.64%
Final Test Accuracy (Traditional): 93.27%

 

-------------------------------------------------------

Sample [A-B-MNIST] 3:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 16

  learning_rate: 0.01

  homogeneity_learning_rate: 0.008

  batch_size: 21000

  training_samples_number: 42000

  total_iterations: 32

 

Training with Homogeneity-driven update:

Total training time: 104.28 seconds

 

Training with traditional backpropagation:

Total training time: 103.38 seconds

 

Final Test Accuracy (Homogeneity-driven): 94.03%

Final Test Accuracy (Traditional): 93.01%

 

 

-------------------------------------------------------

Sample [A-B-MNIST] 4:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 16

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 210

  training_samples_number: 42000

  total_iterations: 3200

 

Training with Homogeneity-driven update:

Total training time: 141.07 seconds

 

Training with traditional backpropagation:

Total training time: 108.91 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.85%

Final Test Accuracy (Traditional): 96.63%

 

 

-------------------------------------------------------

Sample [A-B-MNIST] 5:

-------------------------------------------------------

Hyperparameters and Calculated Values:
  num_epochs: 16
  learning_rate: 0.01
  homogeneity_learning_rate: 0.01
  batch_size: 840
  training_samples_number: 42000
  total_iterations: 800
 
Training with Homogeneity-driven update:
Total training time: 109.00 seconds
 
Training with traditional backpropagation:
Total training time: 100.24 seconds
 
Final Test Accuracy (Homogeneity-driven): 97.72%
Final Test Accuracy (Traditional): 97.31%
 

 
 

-------------------------------------------------------

Sample [A-B-MNIST] 6:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 18

  learning_rate: 0.001

  homogeneity_learning_rate: 0.0008

  batch_size: 2100

  training_samples_number: 42000

  total_iterations: 360

 

Training with Homogeneity-driven update:

Total training time: 119.29 seconds

 

Training with traditional backpropagation:

Epoch processing time: 5.94 seconds

Total training time: 115.52 seconds

 

Final Test Accuracy (Homogeneity-driven): 95.20%

Final Test Accuracy (Traditional): 94.99%
 
 

-------------------------------------------------------

Sample [A-B-MNIST] 7:

-------------------------------------------------------

Hyperparameters and Calculated Values:
  num_epochs: 16
  learning_rate: 0.01
  homogeneity_learning_rate: 0.005
  batch_size: 840
  training_samples_number: 42000
  total_iterations: 800
 
Training with Homogeneity-driven update:
Total training time: 109.48 seconds
 
Training with traditional backpropagation:
Total training time: 100.49 seconds
 
Final Test Accuracy (Homogeneity-driven): 97.69%
Final Test Accuracy (Traditional): 97.51%

 

-------------------------------------------------------

Sample [A-B-MNIST] 8:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 40

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 42000

  training_samples_number: 42000

  total_iterations: 40

 

Training with Homogeneity-driven update:

Total training time: 259.55 seconds

 

Training with traditional backpropagation:

Total training time: 257.16 seconds

 

Final Test Accuracy (Homogeneity-driven): 94.60%

Final Test Accuracy (Traditional): 94.24%

 

-------------------------------------------------------

Sample [A-B-MNIST] 9:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 40

  learning_rate: 0.01

  homogeneity_learning_rate: 0.001

  batch_size: 42000

  training_samples_number: 42000

  total_iterations: 40

 

Training with Homogeneity-driven update:

Total training time: 258.01 seconds

 

Training with traditional backpropagation:

Total training time: 258.25 seconds

 

Final Test Accuracy (Homogeneity-driven): 94.61%

Final Test Accuracy (Traditional): 94.25%

 

 

-------------------------------------------------------

Sample [A-B-MNIST] 10:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 40

  learning_rate: 0.01

  homogeneity_learning_rate: 0.005

  batch_size: 4200

  training_samples_number: 42000

  total_iterations: 400

 

Training with Homogeneity-driven update:

Total training time: 260.06 seconds

 

Training with traditional backpropagation:

Total training time: 253.40 seconds

 

Final Test Accuracy (Homogeneity-driven): 97.13%

Final Test Accuracy (Traditional): 97.02%

 

 

 

-------------------------------------------------------

Sample [A-B-MNIST] 11:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 18

  learning_rate: 0.001

  homogeneity_learning_rate: 0.0008

  batch_size: 1050

  training_samples_number: 42000

  total_iterations: 720

 

Final Test Accuracy (Homogeneity-driven): 96.40%

Final Test Accuracy (Traditional): 96.24%

 

-------------------------------------------------------

Sample [A-B-MNIST] 12:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 10

  learning_rate: 0.001

  homogeneity_learning_rate: 0.001

  batch_size: 672

  training_samples_number: 42000

  total_iterations: 630

 

Training with Homogeneity-driven update:

Total training time: 70.79 seconds

 

Training with traditional backpropagation:

Total training time: 63.73 seconds

 

Final Test Accuracy (Homogeneity-driven): 95.69%

Final Test Accuracy (Traditional): 95.63%

 

-------------------------------------------------------

Sample [A-B-MNIST] 13:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 40

  learning_rate: 0.01

  homogeneity_learning_rate: 0.005

  batch_size: 420

  training_samples_number: 42000

  total_iterations: 4000

 

Training with Homogeneity-driven update:

Total training time: 304.14 seconds

 

Training with traditional backpropagation:

Total training time: 263.50 seconds

 

Final Test Accuracy (Homogeneity-driven): 97.42%

Final Test Accuracy (Traditional): 97.46%

 

 

-------------------------------------------------------

Sample [A-B-MNIST] 14:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 16

  learning_rate: 0.001

  homogeneity_learning_rate: 0.0008

  batch_size: 672

  training_samples_number: 42000

  total_iterations: 1008

 

Training with Homogeneity-driven update:

Total training time: 116.50 seconds

 

Training with traditional backpropagation:

Total training time: 102.51 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.65%

Final Test Accuracy (Traditional): 96.60%

 

-------------------------------------------------------

Sample [A-B-MNIST] 15:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 20

  learning_rate: 0.01

  homogeneity_learning_rate: 0.003

  batch_size: 42

  training_samples_number: 42000

  total_iterations: 20000

 

Training with Homogeneity-driven update:

Total training time: 409.74 seconds

 

Training with traditional backpropagation:

Total training time: 213.06 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.65%

Final Test Accuracy (Traditional): 96.07%

 

 

-------------------------------------------------------

Sample [A-B-MNIST] 16:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 20

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 168

  training_samples_number: 42000

  total_iterations: 5000

 

Training with Homogeneity-driven update:

Total training time: 199.35 seconds

 

Training with traditional backpropagation:

Total training time: 144.97 seconds

 

Final Test Accuracy (Homogeneity-driven): 97.07%

Final Test Accuracy (Traditional): 96.94%

 

 

-------------------------------------------------------

Sample [A-B-MNIST] 17:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 20

  learning_rate: 0.01

  homogeneity_learning_rate: 0.005

  batch_size: 336

  training_samples_number: 42000

  total_iterations: 2500

 

Training with Homogeneity-driven update:

Total training time: 160.23 seconds

 

Training with traditional backpropagation:

Total training time: 131.83 seconds

 

Final Test Accuracy (Homogeneity-driven): 97.34%

Final Test Accuracy (Traditional): 97.16%

 

 

-------------------------------------------------------

Sample [A-B-MNIST] 18:

-------------------------------------------------------

Hyperparameters and Calculated Values:
  num_epochs: 24
  learning_rate: 0.001
  homogeneity_learning_rate: 0.0005
  batch_size: 4200
  training_samples_number: 42000
  total_iterations: 240
 
Training with Homogeneity-driven update:
Total training time: 157.12 seconds
 
Training with traditional backpropagation:
Total training time: 153.61 seconds
 
Final Test Accuracy (Homogeneity-driven): 94.25%
Final Test Accuracy (Traditional): 94.03%

 

 

-------------------------------------------------------

Sample [A-B-MNIST] 19:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 12

  learning_rate: 0.1

  homogeneity_learning_rate: 0.001

  batch_size: 3360

  training_samples_number: 42000

  total_iterations: 156

 

Training with Homogeneity-driven update:

 

Epoch [1/12], Train Loss: 8.7319, Train Accuracy: 30.93%, Val Loss: 1.4389, Val Accuracy: 48.40%

Epoch processing time: 7.59 seconds

Epoch [2/12], Train Loss: 1.1013, Train Accuracy: 64.49%, Val Loss: 0.8079, Val Accuracy: 73.32%

Epoch processing time: 5.98 seconds

Epoch [3/12], Train Loss: 0.6478, Train Accuracy: 80.60%, Val Loss: 0.5574, Val Accuracy: 85.28%

Epoch processing time: 6.99 seconds

Epoch [4/12], Train Loss: 0.4568, Train Accuracy: 87.32%, Val Loss: 0.4280, Val Accuracy: 88.48%

Epoch processing time: 5.81 seconds

Epoch [5/12], Train Loss: 0.3534, Train Accuracy: 89.93%, Val Loss: 0.3521, Val Accuracy: 90.40%

Epoch processing time: 7.00 seconds

Epoch [6/12], Train Loss: 0.2978, Train Accuracy: 91.44%, Val Loss: 0.3133, Val Accuracy: 91.67%

Epoch processing time: 6.05 seconds

Epoch [7/12], Train Loss: 0.2655, Train Accuracy: 92.46%, Val Loss: 0.3019, Val Accuracy: 91.56%

Epoch processing time: 6.93 seconds

Epoch [8/12], Train Loss: 0.2472, Train Accuracy: 92.78%, Val Loss: 0.2807, Val Accuracy: 92.37%

Epoch processing time: 6.01 seconds

Epoch [9/12], Train Loss: 0.2282, Train Accuracy: 93.35%, Val Loss: 0.2738, Val Accuracy: 92.56%

Epoch processing time: 6.92 seconds

Epoch [10/12], Train Loss: 0.2132, Train Accuracy: 93.80%, Val Loss: 0.2575, Val Accuracy: 92.93%

Epoch processing time: 6.06 seconds

Epoch [11/12], Train Loss: 0.1981, Train Accuracy: 94.22%, Val Loss: 0.2553, Val Accuracy: 92.82%

Epoch processing time: 6.95 seconds

Epoch [12/12], Train Loss: 0.1878, Train Accuracy: 94.39%, Val Loss: 0.2489, Val Accuracy: 93.12%

 

Epoch processing time: 5.82 seconds

Total training time: 78.10 seconds

 

Training with traditional backpropagation:

Epoch [1/12], Train Loss: 6.9806, Train Accuracy: 37.27%, Val Loss: 1.5671, Val Accuracy: 49.42%

Epoch processing time: 6.84 seconds

Epoch [2/12], Train Loss: 1.1666, Train Accuracy: 60.90%, Val Loss: 0.9454, Val Accuracy: 72.51%

Epoch processing time: 5.91 seconds

Epoch [3/12], Train Loss: 0.7834, Train Accuracy: 75.20%, Val Loss: 0.6856, Val Accuracy: 78.37%

Epoch processing time: 6.80 seconds

Epoch [4/12], Train Loss: 0.5979, Train Accuracy: 81.24%, Val Loss: 0.5383, Val Accuracy: 83.37%

Epoch processing time: 5.91 seconds

Epoch [5/12], Train Loss: 0.4758, Train Accuracy: 85.61%, Val Loss: 0.4466, Val Accuracy: 86.86%

Epoch processing time: 6.86 seconds

Epoch [6/12], Train Loss: 0.3828, Train Accuracy: 88.98%, Val Loss: 0.3825, Val Accuracy: 89.56%

Epoch processing time: 5.87 seconds

Epoch [7/12], Train Loss: 0.3210, Train Accuracy: 91.02%, Val Loss: 0.3443, Val Accuracy: 90.71%

Epoch processing time: 6.86 seconds

Epoch [8/12], Train Loss: 0.2844, Train Accuracy: 92.08%, Val Loss: 0.3211, Val Accuracy: 91.11%

Epoch processing time: 5.70 seconds

Epoch [9/12], Train Loss: 0.2688, Train Accuracy: 92.48%, Val Loss: 0.3099, Val Accuracy: 91.96%

Epoch processing time: 6.86 seconds

Epoch [10/12], Train Loss: 0.2522, Train Accuracy: 93.00%, Val Loss: 0.2966, Val Accuracy: 92.21%

Epoch processing time: 5.94 seconds

Epoch [11/12], Train Loss: 0.2417, Train Accuracy: 93.25%, Val Loss: 0.2952, Val Accuracy: 91.84%

Epoch processing time: 7.90 seconds

Epoch [12/12], Train Loss: 0.2306, Train Accuracy: 93.39%, Val Loss: 0.2825, Val Accuracy: 92.67%

 

Epoch processing time: 5.94 seconds

Total training time: 77.39 seconds

 

Final Test Accuracy (Homogeneity-driven): 93.44%

Final Test Accuracy (Traditional): 92.67%

 

 

 

 

 

 


 

3. OPTION WITH DYNAMIC HOMOGENEITY LEARNING RATE and labda depends on  – [B-A] – MNIST DATASET

 

 

=========================================

# BEGINNING OF THE CODE

 

import torch

import torch.nn as nn

import torch.optim as optim

from torch.utils.data import DataLoader, random_split

from torchvision import datasets, transforms

import numpy as np

import matplotlib.pyplot as plt

import time

import math

 

# Device configuration

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

 

# Define the neural network model

class Net(nn.Module):

    def __init__(self):

        super(Net, self).__init__()

        self.fc1 = nn.Linear(28 * 28, 128)

        self.relu = nn.ReLU()

        self.fc2 = nn.Linear(128, 10)

 

    def forward(self, x):

        x = x.view(-1, 28 * 28)

        x = self.fc1(x)

        x = self.relu(x)

        x = self.fc2(x)

        return x

 

# Function to calculate Absolute Difference Similarity (ADS)

def calculate_similarity(status, average, epsilon=1e-8):

    num = torch.sum(torch.abs(status - average))

    den = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

    similarity = 1 - (num / den)

    return similarity

 

# Function to calculate partial derivative of Homogeneity

def calculate_partial_derivative(status, average, homogeneity_lambda, epsilon=1e-8):

    N = torch.sum(torch.abs(status - average))

    D = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

 

    condition1 = torch.logical_or(torch.logical_and(status > average, average > 0),

                                  torch.logical_and(status < average, average < 0))

    condition2 = torch.logical_or(torch.logical_and(status > 0, average < 0),

                                  torch.logical_and(status < 0, average > 0))

    condition3 = torch.logical_and(status == 0, average > 0)

    condition4 = torch.logical_and(status == 0, average < 0)

    condition5 = status == average

 

    partial_derivative = torch.zeros_like(status)

 

    partial_derivative[condition1] = (1 - homogeneity_lambda) * (1 / D**2) * (D - N)

    partial_derivative[condition2] = (1 - homogeneity_lambda) * (1 / D**2) * (D + N)

    partial_derivative[condition3] = -(1 - homogeneity_lambda) / D

    partial_derivative[condition4] = (1 - homogeneity_lambda) / D

    partial_derivative[condition5] = 0

 

    remaining_indices = torch.logical_not(torch.logical_or(torch.logical_or(torch.logical_or(condition1, condition2),

                                                                           torch.logical_or(condition3, condition4)),

                                                           condition5))

    partial_derivative[remaining_indices] = (1 - homogeneity_lambda) * (1 / D**2) * (

        D * torch.sign(status[remaining_indices] - average[remaining_indices]) -

        N * torch.sign(status[remaining_indices])

    )

 

    return partial_derivative

 

# Function to perform Homogeneity-driven weight update

def weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average, epsilon=1e-8):

    partial_derivative = calculate_partial_derivative(status, average, homogeneity_lambda, epsilon)

    delta = -homogeneity_learning_rate * partial_derivative

    return delta

 

# Function to train the model with Homogeneity-driven update

def train_homogeneity_driven(model, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    optimizer_homogeneity = optim.Adam(model.parameters(), lr=homogeneity_learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

    homogeneity_values = []

 

    homogeneity = 1.0

    total_start_time = time.time()

    iteration_counter = 1

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

 

            # --- Backpropagation Update ---

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

           

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # --- Homogeneity-driven Update ---

            status = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            weights_before_update = status.clone()

 

 #           print("Model weights before homogeneity update (first 5 values of fc1.weight):")

 #           print(model.fc1.weight.data[:5])

 

            similarity = calculate_similarity(status, average)

            homogeneity_lambda = (iteration_counter - 1) / total_iterations

            homogeneity = (1 - homogeneity_lambda) * similarity + homogeneity_lambda * homogeneity

            homogeneity_values.append(homogeneity.item())

 

            delta = weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average)

 

            current_index = 0

            for param in model.parameters():

                param_size = param.nelement()

                update_value = delta[current_index: current_index + param_size].view_as(param.data)

                param.grad = -update_value

                current_index += param_size

 

            optimizer_homogeneity.step()

            optimizer_homogeneity.zero_grad()

 

   #         print("\nModel weights after homogeneity update (first 5 values of fc1.weight):")

   #        print(model.fc1.weight.data[:5])

 

            weights_after_update = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            distance = torch.norm(weights_before_update - weights_after_update)

   #         print(f"Distance between weights before and after homogeneity update: {distance.item()}\n")

 

            average = (status + (epoch * len(train_loader) + batch_idx) * average) / (

                        epoch * len(train_loader) + batch_idx + 1)

 

            optimizer.zero_grad()

 

   #         print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

   #               f"Backpropagation Update: {total_bp_update:.4f}, Homogeneity Update: {torch.sum(torch.abs(delta)).item():.4f}, "

   #               f"homogeneity_lambda: {homogeneity_lambda:.4f}, similarity: {similarity:.4f}, homogeneity: {homogeneity:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies, homogeneity_values

 

# Function to train the model with traditional backpropagation

def train_traditional(model, train_loader, val_loader, num_epochs, learning_rate):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

 

    total_start_time = time.time()

    iteration_counter = 1  # Initialize iteration_counter here

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

 

            # Calculate total_bp_update for each iteration

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

            #       f"Backpropagation Update: {total_bp_update:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies

 

# Function to calculate accuracy

def calculate_accuracy(model, data_loader):

    model.eval()

    correct = 0

    total = 0

 

    with torch.no_grad():

        for images, labels in data_loader:

            images, labels = images.to(device), labels.to(device)

            outputs = model(images)

            _, predicted = torch.max(outputs.data, 1)

 

            total += labels.size(0)

            correct += (predicted == labels).sum().item()

 

    return 100 * correct / total

 

# !!!!!!!!!!!! Hyperparameters !!!!!!!!!!!!!!!!!!!!!

num_epochs = 4

learning_rate = 0.01

batch_size = 10

# removed to test dynamic value   homogeneity_learning_rate = 0.01

 

# Load and split MNIST dataset

transform = transforms.ToTensor()

full_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

 

training_samples_number = int(0.7 * len(full_dataset))

total_iterations = math.ceil(training_samples_number / batch_size) * num_epochs

 

homogeneity_learning_rate = (2* learning_rate) / (learning_rate * total_iterations + 1)

 

print("Hyperparameters and Calculated Values:")

print(f"  num_epochs: {num_epochs}")

print(f"  learning_rate: {learning_rate}")

print(f"  homogeneity_learning_rate: {homogeneity_learning_rate}")

print(f"  batch_size: {batch_size}")

print(f"  training_samples_number: {training_samples_number}")

print(f"  total_iterations: {total_iterations}")

print("\n")

 

train_size = int(0.7 * len(full_dataset))

val_size = int(0.15 * len(full_dataset))

test_size = len(full_dataset) - train_size - val_size

train_dataset, val_dataset, _ = random_split(full_dataset, [train_size, val_size, test_size])

 

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

 

# Initialize models

model_homogeneity = Net().to(device)

model_traditional = Net().to(device)

 

average_model = Net().to(device)

average = torch.cat([p.data.flatten() for p in average_model.parameters()]).detach()

 

# Train the models

print("Training with Homogeneity-driven update:")

results_homogeneity = train_homogeneity_driven(model_homogeneity, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average)

 

print("\nTraining with traditional backpropagation:")

results_traditional = train_traditional(model_traditional, train_loader, val_loader, num_epochs, learning_rate)

 

# Evaluate and plot results

train_losses_h, val_losses_h, train_accuracies_h, val_accuracies_h, homogeneity_values_h = results_homogeneity

train_losses_t, val_losses_t, train_accuracies_t, val_accuracies_t = results_traditional

 

print(f"\nFinal Test Accuracy (Homogeneity-driven): {calculate_accuracy(model_homogeneity, test_loader):.2f}%")

print(f"Final Test Accuracy (Traditional): {calculate_accuracy(model_traditional, test_loader):.2f}%")

 

plt.figure(figsize=(10, 5))

plt.plot(train_losses_h, label='Homogeneity-Driven Train Loss')

plt.plot(val_losses_h, label='Homogeneity-Driven Validation Loss')

plt.plot(train_losses_t, label='Traditional Train Loss')

plt.plot(val_losses_t, label='Traditional Validation Loss')

plt.title('Training and Validation Loss')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(train_accuracies_h, label='Homogeneity-Driven Train Accuracy')

plt.plot(val_accuracies_h, label='Homogeneity-Driven Validation Accuracy')

plt.plot(train_accuracies_t, label='Traditional Train Accuracy')

plt.plot(val_accuracies_t, label='Traditional Validation Accuracy')

plt.title('Training and Validation Accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(homogeneity_values_h, label='Homogeneity')

plt.title('Homogeneity Values over Iterations')

plt.xlabel('Iteration')

plt.ylabel('Homogeneity')

plt.legend()

plt.show()

 

# END OF THE CODE

=========================================

 

Samples of run:

-------------------------------------------------------

Sample [B-A-MNIST] 1:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 4

  learning_rate: 0.01

  homogeneity_learning_rate: 0.00011834319526627219

  batch_size: 10

  training_samples_number: 42000

  total_iterations: 16800

 

Training with Homogeneity-driven update:

Total training time: 249.94 seconds

 

Training with traditional backpropagation:

Epoch processing time: 22.37 seconds

Total training time: 90.24 seconds

 

Final Test Accuracy (Homogeneity-driven): 95.12%

Final Test Accuracy (Traditional): 94.38%

 

 

-------------------------------------------------------

Sample [B-A-MNIST] 2:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.00011834319526627219

  batch_size: 20

  training_samples_number: 42000

  total_iterations: 16800

 

Training with Homogeneity-driven update:

Total training time: 277.78 seconds

 

Training with traditional backpropagation:

Total training time: 112.23 seconds

 

Final Test Accuracy (Homogeneity-driven): 95.91%

Final Test Accuracy (Traditional): 95.78%

 

 

-------------------------------------------------------

Sample [B-A-MNIST] 3:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 10

  learning_rate: 0.01

  homogeneity_learning_rate: 0.006666666666666667

  batch_size: 2100

  training_samples_number: 42000

  total_iterations: 200

 

Training with Homogeneity-driven update:

Total training time: 65.42 seconds

 

Training with traditional backpropagation:

Total training time: 62.62 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.92%

Final Test Accuracy (Traditional): 96.87%

 

 

 


 

4. HOMOGENEITY NN TRAINING (Option with lambda based on current iteration and dynamic learning rate computation) – [B-B]– MNIST DATASET

 

=========================================

# BEGINNING OF THE CODE

 

import torch

import torch.nn as nn

import torch.optim as optim

from torch.utils.data import DataLoader, random_split

from torchvision import datasets, transforms

import numpy as np

import matplotlib.pyplot as plt

import time

import math

 

# Device configuration

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

 

# Define the neural network model

class Net(nn.Module):

    def __init__(self):

        super(Net, self).__init__()

        self.fc1 = nn.Linear(28 * 28, 128)

        self.relu = nn.ReLU()

        self.fc2 = nn.Linear(128, 10)

 

    def forward(self, x):

        x = x.view(-1, 28 * 28)

        x = self.fc1(x)

        x = self.relu(x)

        x = self.fc2(x)

        return x

 

# Function to calculate Absolute Difference Similarity (ADS)

def calculate_similarity(status, average, epsilon=1e-8):

    num = torch.sum(torch.abs(status - average))

    den = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

    similarity = 1 - (num / den)

    return similarity

 

# Function to calculate partial derivative of Homogeneity

def calculate_partial_derivative(status, average, homogeneity_lambda, epsilon=1e-8):

    N = torch.sum(torch.abs(status - average))

    D = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

 

    condition1 = torch.logical_or(torch.logical_and(status > average, average > 0),

                                  torch.logical_and(status < average, average < 0))

    condition2 = torch.logical_or(torch.logical_and(status > 0, average < 0),

                                  torch.logical_and(status < 0, average > 0))

    condition3 = torch.logical_and(status == 0, average > 0)

    condition4 = torch.logical_and(status == 0, average < 0)

    condition5 = status == average

 

    partial_derivative = torch.zeros_like(status)

 

    partial_derivative[condition1] = (1 - homogeneity_lambda) * (1 / D**2) * (D - N)

    partial_derivative[condition2] = (1 - homogeneity_lambda) * (1 / D**2) * (D + N)

    partial_derivative[condition3] = -(1 - homogeneity_lambda) / D

    partial_derivative[condition4] = (1 - homogeneity_lambda) / D

    partial_derivative[condition5] = 0

 

    remaining_indices = torch.logical_not(torch.logical_or(torch.logical_or(torch.logical_or(condition1, condition2),

                                                                           torch.logical_or(condition3, condition4)),

                                                           condition5))

    partial_derivative[remaining_indices] = (1 - homogeneity_lambda) * (1 / D**2) * (

        D * torch.sign(status[remaining_indices] - average[remaining_indices]) -

        N * torch.sign(status[remaining_indices])

    )

 

    return partial_derivative

 

# Function to perform Homogeneity-driven weight update

def weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average, epsilon=1e-8):

    partial_derivative = calculate_partial_derivative(status, average, homogeneity_lambda, epsilon)

    delta = -homogeneity_learning_rate * partial_derivative

    return delta

 

# Function to train the model with Homogeneity-driven update

def train_homogeneity_driven(model, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    optimizer_homogeneity = optim.Adam(model.parameters(), lr=homogeneity_learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

    homogeneity_values = []

 

    homogeneity = 1.0

    total_start_time = time.time()

    iteration_counter = 1

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

 

            # --- Backpropagation Update ---

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

           

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # --- Homogeneity-driven Update ---

            status = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            weights_before_update = status.clone()

 

 #           print("Model weights before homogeneity update (first 5 values of fc1.weight):")

 #           print(model.fc1.weight.data[:5])

 

            similarity = calculate_similarity(status, average)

            homogeneity_lambda = (iteration_counter - 1) / iteration_counter

            homogeneity = (1 - homogeneity_lambda) * similarity + homogeneity_lambda * homogeneity

            homogeneity_values.append(homogeneity.item())

 

            delta = weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average)

 

            current_index = 0

            for param in model.parameters():

                param_size = param.nelement()

                update_value = delta[current_index: current_index + param_size].view_as(param.data)

                param.grad = -update_value

                current_index += param_size

 

            optimizer_homogeneity.step()

            optimizer_homogeneity.zero_grad()

 

   #         print("\nModel weights after homogeneity update (first 5 values of fc1.weight):")

   #        print(model.fc1.weight.data[:5])

 

            weights_after_update = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            distance = torch.norm(weights_before_update - weights_after_update)

   #         print(f"Distance between weights before and after homogeneity update: {distance.item()}\n")

 

            average = (status + (epoch * len(train_loader) + batch_idx) * average) / (

                        epoch * len(train_loader) + batch_idx + 1)

 

            optimizer.zero_grad()

 

   #         print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

   #               f"Backpropagation Update: {total_bp_update:.4f}, Homogeneity Update: {torch.sum(torch.abs(delta)).item():.4f}, "

   #               f"homogeneity_lambda: {homogeneity_lambda:.4f}, similarity: {similarity:.4f}, homogeneity: {homogeneity:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies, homogeneity_values

 

# Function to train the model with traditional backpropagation

def train_traditional(model, train_loader, val_loader, num_epochs, learning_rate):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

 

    total_start_time = time.time()

    iteration_counter = 1  # Initialize iteration_counter here

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

 

            # Calculate total_bp_update for each iteration

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

            #       f"Backpropagation Update: {total_bp_update:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies

 

# Function to calculate accuracy

def calculate_accuracy(model, data_loader):

    model.eval()

    correct = 0

    total = 0

 

    with torch.no_grad():

        for images, labels in data_loader:

            images, labels = images.to(device), labels.to(device)

            outputs = model(images)

            _, predicted = torch.max(outputs.data, 1)

 

            total += labels.size(0)

            correct += (predicted == labels).sum().item()

 

    return 100 * correct / total

 

# !!!!!!!!!!!! Hyperparameters !!!!!!!!!!!!!!!!!!!!!

num_epochs = 4

learning_rate = 0.01

batch_size = 10

# removed to test dynamic value   homogeneity_learning_rate = 0.01

 

# Load and split MNIST dataset

transform = transforms.ToTensor()

full_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

 

training_samples_number = int(0.7 * len(full_dataset))

total_iterations = math.ceil(training_samples_number / batch_size) * num_epochs

 

homogeneity_learning_rate = (2* learning_rate) / (learning_rate * total_iterations + 1)

 

print("Hyperparameters and Calculated Values:")

print(f"  num_epochs: {num_epochs}")

print(f"  learning_rate: {learning_rate}")

print(f"  homogeneity_learning_rate: {homogeneity_learning_rate}")

print(f"  batch_size: {batch_size}")

print(f"  training_samples_number: {training_samples_number}")

print(f"  total_iterations: {total_iterations}")

print("\n")

 

train_size = int(0.7 * len(full_dataset))

val_size = int(0.15 * len(full_dataset))

test_size = len(full_dataset) - train_size - val_size

train_dataset, val_dataset, _ = random_split(full_dataset, [train_size, val_size, test_size])

 

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

 

# Initialize models

model_homogeneity = Net().to(device)

model_traditional = Net().to(device)

 

average_model = Net().to(device)

average = torch.cat([p.data.flatten() for p in average_model.parameters()]).detach()

 

# Train the models

print("Training with Homogeneity-driven update:")

results_homogeneity = train_homogeneity_driven(model_homogeneity, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average)

 

print("\nTraining with traditional backpropagation:")

results_traditional = train_traditional(model_traditional, train_loader, val_loader, num_epochs, learning_rate)

 

# Evaluate and plot results

train_losses_h, val_losses_h, train_accuracies_h, val_accuracies_h, homogeneity_values_h = results_homogeneity

train_losses_t, val_losses_t, train_accuracies_t, val_accuracies_t = results_traditional

 

print(f"\nFinal Test Accuracy (Homogeneity-driven): {calculate_accuracy(model_homogeneity, test_loader):.2f}%")

print(f"Final Test Accuracy (Traditional): {calculate_accuracy(model_traditional, test_loader):.2f}%")

 

plt.figure(figsize=(10, 5))

plt.plot(train_losses_h, label='Homogeneity-Driven Train Loss')

plt.plot(val_losses_h, label='Homogeneity-Driven Validation Loss')

plt.plot(train_losses_t, label='Traditional Train Loss')

plt.plot(val_losses_t, label='Traditional Validation Loss')

plt.title('Training and Validation Loss')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(train_accuracies_h, label='Homogeneity-Driven Train Accuracy')

plt.plot(val_accuracies_h, label='Homogeneity-Driven Validation Accuracy')

plt.plot(train_accuracies_t, label='Traditional Train Accuracy')

plt.plot(val_accuracies_t, label='Traditional Validation Accuracy')

plt.title('Training and Validation Accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(homogeneity_values_h, label='Homogeneity')

plt.title('Homogeneity Values over Iterations')

plt.xlabel('Iteration')

plt.ylabel('Homogeneity')

plt.legend()

plt.show()

 

 

# END OF THE CODE

=========================================

 

 

Samples of run:

-------------------------------------------------------

Sample [B-B-MNIST] 1:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.0005780346820809249

  batch_size: 100

  training_samples_number: 42000

  total_iterations: 3360

 

Training with Homogeneity-driven update:

Total training time: 133.06 seconds

 

Training with traditional backpropagation:

Total training time: 58.51 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.67%

Final Test Accuracy (Traditional): 96.27%

 

-------------------------------------------------------

Sample [B-B-MNIST] 2:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.001

  homogeneity_learning_rate: 0.00022222222222222223

  batch_size: 42

  training_samples_number: 42000

  total_iterations: 8000

 

Training with Homogeneity-driven update:

Total training time: 201.09 seconds

 

Training with traditional backpropagation:

Total training time: 71.01 seconds

 

Final Test Accuracy (Homogeneity-driven): 97.42%

Final Test Accuracy (Traditional): 97.33%

 

 

-------------------------------------------------------

Sample [B-B-MNIST] 3:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.1

  homogeneity_learning_rate: 0.00012492192379762648

  batch_size: 21

  training_samples_number: 42000

  total_iterations: 16000

 

Training with Homogeneity-driven update:

Total training time: 316.75 seconds

 

Training with traditional backpropagation:

Total training time: 127.37 seconds

 

Final Test Accuracy (Homogeneity-driven): 49.31%

Final Test Accuracy (Traditional): 45.58%

 

 


 

SAME (basic) CODE IDEA FOR IRIS DATASET (labda depends on number of iterations  and dynamic learning rate) [B-A]

 

=========================================

# BEGINNING OF THE CODE

import torch

import torch.nn as nn

import torch.optim as optim

from torch.utils.data import Dataset, DataLoader, random_split

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

import numpy as np

import matplotlib.pyplot as plt

import time

import math

 

# Device configuration

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

 

# Define the neural network model (adjusted for Iris dataset)

class Net(nn.Module):

    def __init__(self, input_size, hidden_size, output_size):

        super(Net, self).__init__()

        self.fc1 = nn.Linear(input_size, hidden_size)

        self.relu = nn.ReLU()

        self.fc2 = nn.Linear(hidden_size, output_size)

 

    def forward(self, x):

        x = self.fc1(x)

        x = self.relu(x)

        x = self.fc2(x)

        return x

 

# Function to calculate Absolute Difference Similarity (ADS)

def calculate_similarity(status, average, epsilon=1e-8):

    num = torch.sum(torch.abs(status - average))

    den = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

    similarity = 1 - (num / den)

    return similarity

 

# Function to calculate partial derivative of Homogeneity

def calculate_partial_derivative(status, average, homogeneity_lambda, epsilon=1e-8):

    N = torch.sum(torch.abs(status - average))

    D = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

 

    condition1 = torch.logical_or(torch.logical_and(status > average, average > 0),

                                  torch.logical_and(status < average, average < 0))

    condition2 = torch.logical_or(torch.logical_and(status > 0, average < 0),

                                  torch.logical_and(status < 0, average > 0))

    condition3 = torch.logical_and(status == 0, average > 0)

    condition4 = torch.logical_and(status == 0, average < 0)

    condition5 = status == average

 

    partial_derivative = torch.zeros_like(status)

 

    partial_derivative[condition1] = (1 - homogeneity_lambda) * (1 / D**2) * (D - N)

    partial_derivative[condition2] = (1 - homogeneity_lambda) * (1 / D**2) * (D + N)

    partial_derivative[condition3] = -(1 - homogeneity_lambda) / D

    partial_derivative[condition4] = (1 - homogeneity_lambda) / D

    partial_derivative[condition5] = 0

 

    remaining_indices = torch.logical_not(torch.logical_or(torch.logical_or(torch.logical_or(condition1, condition2),

                                                                           torch.logical_or(condition3, condition4)),

                                                           condition5))

    partial_derivative[remaining_indices] = (1 - homogeneity_lambda) * (1 / D**2) * (

        D * torch.sign(status[remaining_indices] - average[remaining_indices]) -

        N * torch.sign(status[remaining_indices])

    )

 

    return partial_derivative

 

# Function to perform Homogeneity-driven weight update

def weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average, epsilon=1e-8):

    partial_derivative = calculate_partial_derivative(status, average, homogeneity_lambda, epsilon)

    delta = -homogeneity_learning_rate * partial_derivative

    return delta

 

# Function to train the model with Homogeneity-driven update

def train_homogeneity_driven(model, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average, total_iterations, batch_size):

    # Added batch_size as an argument

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    optimizer_homogeneity = optim.Adam(model.parameters(), lr=homogeneity_learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

    homogeneity_values = []

 

    homogeneity = 1.0

    total_start_time = time.time()

    iteration_counter = 1

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

 

            # --- Backpropagation Update ---

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

 

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # --- Homogeneity-driven Update ---

            status = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            weights_before_update = status.clone()

 

            # print("Model weights before homogeneity update (first 5 values of fc1.weight):")

            # print(model.fc1.weight.data[:5])

 

            similarity = calculate_similarity(status, average)

            homogeneity_lambda = (iteration_counter - 1) / total_iterations

            homogeneity = (1 - homogeneity_lambda) * similarity + homogeneity_lambda * homogeneity

            homogeneity_values.append(homogeneity.item())

 

            delta = weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average)

 

            current_index = 0

            for param in model.parameters():

                param_size = param.nelement()

                update_value = delta[current_index: current_index + param_size].view_as(param.data)

                param.grad = -update_value

                current_index += param_size

 

            optimizer_homogeneity.step()

            optimizer_homogeneity.zero_grad()

 

            # print("\nModel weights after homogeneity update (first 5 values of fc1.weight):")

            # print(model.fc1.weight.data[:5])

 

            weights_after_update = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            distance = torch.norm(weights_before_update - weights_after_update)

            # print(f"Distance between weights before and after homogeneity update: {distance.item()}\n")

 

            average = (status + (epoch * len(train_loader) + batch_idx) * average) / (

                        epoch * len(train_loader) + batch_idx + 1)

 

            optimizer.zero_grad()

 

            # print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

            #       f"Backpropagation Update: {total_bp_update:.4f}, Homogeneity Update: {torch.sum(torch.abs(delta)).item():.4f}, "

            #       f"homogeneity_lambda: {homogeneity_lambda:.4f}, similarity: {similarity:.4f}, homogeneity: {homogeneity:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        #print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies, homogeneity_values

 

# Function to train the model with traditional backpropagation

def train_traditional(model, train_loader, val_loader, num_epochs, learning_rate, batch_size):

    # Added batch_size as an argument

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

 

    total_start_time = time.time()

    iteration_counter = 1  # Initialize iteration_counter here

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

 

            # Calculate total_bp_update for each iteration

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

            #       f"Backpropagation Update: {total_bp_update:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        #print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies

 

# Function to calculate accuracy

def calculate_accuracy(model, data_loader):

    model.eval()

    correct = 0

    total = 0

 

    with torch.no_grad():

        for images, labels in data_loader:

            images, labels = images.to(device), labels.to(device)

            outputs = model(images)

            _, predicted = torch.max(outputs.data, 1)

 

            total += labels.size(0)

            correct += (predicted == labels).sum().item()

 

    return 100 * correct / total

 

# -------------------- Hyperparameters --------------------

num_epochs = 10

learning_rate = 0.01

# removed to be replaced by dynamic option   homogeneity_learning_rate = 0.01

input_size = 4  # Number of features in Iris dataset

hidden_size = 10

output_size = 3  # Number of classes in Iris dataset

batch_size = 32 # Added batch size definition

# ---------------------------------------------------------

 

# Load the Iris dataset

iris = load_iris()

X = iris.data

y = iris.target

 

# Split data into training, validation, and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)  # 0.25 x 0.8 = 0.2

 

# Scale the data

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_val = scaler.transform(X_val)

X_test = scaler.transform(X_test)

 

# Create custom dataset class for Iris data

class IrisDataset(Dataset):

    def __init__(self, X, y):

        self.X = torch.tensor(X, dtype=torch.float32)

        self.y = torch.tensor(y, dtype=torch.long)

 

    def __len__(self):

        return len(self.X)

 

    def __getitem__(self, idx):

        return self.X[idx], self.y[idx]

 

# Create data loaders

train_dataset = IrisDataset(X_train, y_train)

val_dataset = IrisDataset(X_val, y_val)

test_dataset = IrisDataset(X_test, y_test)

 

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True) # Updated to use batch_size variable

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False) # Updated to use batch_size variable

test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False) # Updated to use batch_size variable

 

training_samples_number = len(train_dataset)

total_iterations = math.ceil(training_samples_number / batch_size) * num_epochs # Updated to use batch_size variable

 

homogeneity_learning_rate = (2* learning_rate) / (learning_rate * total_iterations + 1)

 

print("Hyperparameters and Calculated Values:")

print(f"  num_epochs: {num_epochs}")

print(f"  learning_rate: {learning_rate}")

print(f"  homogeneity_learning_rate: {homogeneity_learning_rate}")

print(f"  batch_size: {batch_size}")

print(f"  training_samples_number: {training_samples_number}")

print(f"  total_iterations: {total_iterations}")

print("\n")

 

# Initialize models

model_homogeneity = Net(input_size, hidden_size, output_size).to(device)

model_traditional = Net(input_size, hidden_size, output_size).to(device)

 

average_model = Net(input_size, hidden_size, output_size).to(device)

average = torch.cat([p.data.flatten() for p in average_model.parameters()]).detach()

 

# Train the models

print("Training with Homogeneity-driven update:")

results_homogeneity = train_homogeneity_driven(model_homogeneity, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average, total_iterations, batch_size)

 

print("\nTraining with traditional backpropagation:")

results_traditional = train_traditional(model_traditional, train_loader, val_loader, num_epochs, learning_rate, batch_size)

 

# Evaluate and plot results

train_losses_h, val_losses_h, train_accuracies_h, val_accuracies_h, homogeneity_values_h = results_homogeneity

train_losses_t, val_losses_t, train_accuracies_t, val_accuracies_t = results_traditional

 

print(f"\nFinal Test Accuracy (Homogeneity-driven): {calculate_accuracy(model_homogeneity, test_loader):.2f}%")

print(f"Final Test Accuracy (Traditional): {calculate_accuracy(model_traditional, test_loader):.2f}%")

 

plt.figure(figsize=(10, 5))

plt.plot(train_losses_h, label='Homogeneity-Driven Train Loss')

plt.plot(val_losses_h, label='Homogeneity-Driven Validation Loss')

plt.plot(train_losses_t, label='Traditional Train Loss')

plt.plot(val_losses_t, label='Traditional Validation Loss')

plt.title('Training and Validation Loss')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(train_accuracies_h, label='Homogeneity-Driven Train Accuracy')

plt.plot(val_accuracies_h, label='Homogeneity-Driven Validation Accuracy')

plt.plot(train_accuracies_t, label='Traditional Train Accuracy')

plt.plot(val_accuracies_t, label='Traditional Validation Accuracy')

plt.title('Training and Validation Accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(homogeneity_values_h, label='Homogeneity')

plt.title('Homogeneity Values over Iterations')

plt.xlabel('Iteration')

plt.ylabel('Homogeneity')

plt.legend()

plt.show()

 

# END OF THE CODE

=========================================

 

 

Samples of run:

-------------------------------------------------------

Sample [B-A-IRIS] 1:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 10

  learning_rate: 0.01

  homogeneity_learning_rate: 0.015384615384615384

  batch_size: 32

  training_samples_number: 90

  total_iterations: 30

 

Training with Homogeneity-driven update:

Total training time: 0.16 seconds

 

Training with traditional backpropagation:

 

Final Test Accuracy (Homogeneity-driven): 86.67%

Final Test Accuracy (Traditional): 80.00%

 

 

-------------------------------------------------------

Sample [B-A-IRIS] 2:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 2

  learning_rate: 0.01

  homogeneity_learning_rate: 0.0196078431372549

  batch_size: 90

  training_samples_number: 90

  total_iterations: 2

  hidden size: 16

 

Training with Homogeneity-driven update:

Total training time: 0.02 seconds

 

Training with traditional backpropagation:

Total training time: 0.01 seconds

 

Final Test Accuracy (Homogeneity-driven): 63.33%

Final Test Accuracy (Traditional): 46.67%

 

 

-------------------------------------------------------

Sample [B-A-IRIS] 3:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 2

  learning_rate: 0.01

  homogeneity_learning_rate: 0.0196078431372549

  batch_size: 90

  training_samples_number: 90

  total_iterations: 2

  hidden_size: 4

 

Training with Homogeneity-driven update:

Total training time: 0.02 seconds

 

Training with traditional backpropagation:

Total training time: 0.01 seconds

 

Final Test Accuracy (Homogeneity-driven): 40.00%

Final Test Accuracy (Traditional): 16.67%

 

-------------------------------------------------------

Sample [B-A-IRIS] 4:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 2

  hidden_size: 2

  learning_rate: 0.01

  homogeneity_learning_rate: 0.0196078431372549

  batch_size: 90

  training_samples_number: 90

  total_iterations: 2

 

Training with Homogeneity-driven update:

Total training time: 0.02 seconds

 

Training with traditional backpropagation:

Total training time: 0.01 seconds

 

Final Test Accuracy (Homogeneity-driven): 43.33%

Final Test Accuracy (Traditional): 36.67%

 

 

-------------------------------------------------------

Sample [B-A-IRIS] 5:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 2

  hidden_size: 24

  learning_rate: 0.01

  homogeneity_learning_rate: 0.0196078431372549

  batch_size: 90

  training_samples_number: 90

  total_iterations: 2

 

Training with Homogeneity-driven update:

Total training time: 0.06 seconds

 

Training with traditional backpropagation:

Total training time: 0.03 seconds

 

Final Test Accuracy (Homogeneity-driven): 70.00%

Final Test Accuracy (Traditional): 43.33%

 

 

-------------------------------------------------------

Sample [B-A-IRIS] 6:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 10

  hidden_size: 3

  learning_rate: 0.01

  homogeneity_learning_rate: 0.015384615384615384

  batch_size: 30

  training_samples_number: 90

  total_iterations: 30

 

Training with Homogeneity-driven update:

Total training time: 0.20 seconds

 

Training with traditional backpropagation:

Total training time: 0.14 seconds

 

Final Test Accuracy (Homogeneity-driven): 90.00%

Final Test Accuracy (Traditional): 70.00%

 

 

 

 


 

SAME (basic) CODE IDEA FOR IRIS DATASET (labda depends on current iteration, manual setting of learning rate) [A-B]

 

=========================================

# BEGINNING OF THE CODE

 

import torch

import torch.nn as nn

import torch.optim as optim

from torch.utils.data import Dataset, DataLoader, random_split

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

import numpy as np

import matplotlib.pyplot as plt

import time

import math

 

# Device configuration

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

 

# Define the neural network model (adjusted for Iris dataset)

class Net(nn.Module):

    def __init__(self, input_size, hidden_size, output_size):

        super(Net, self).__init__()

        self.fc1 = nn.Linear(input_size, hidden_size)

        self.relu = nn.ReLU()

        self.fc2 = nn.Linear(hidden_size, output_size)

 

    def forward(self, x):

        x = self.fc1(x)

        x = self.relu(x)

        x = self.fc2(x)

        return x

 

# Function to calculate Absolute Difference Similarity (ADS)

def calculate_similarity(status, average, epsilon=1e-8):

    num = torch.sum(torch.abs(status - average))

    den = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

    similarity = 1 - (num / den)

    return similarity

 

# Function to calculate partial derivative of Homogeneity

def calculate_partial_derivative(status, average, homogeneity_lambda, epsilon=1e-8):

    N = torch.sum(torch.abs(status - average))

    D = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

 

    condition1 = torch.logical_or(torch.logical_and(status > average, average > 0),

                                  torch.logical_and(status < average, average < 0))

    condition2 = torch.logical_or(torch.logical_and(status > 0, average < 0),

                                  torch.logical_and(status < 0, average > 0))

    condition3 = torch.logical_and(status == 0, average > 0)

    condition4 = torch.logical_and(status == 0, average < 0)

    condition5 = status == average

 

    partial_derivative = torch.zeros_like(status)

 

    partial_derivative[condition1] = (1 - homogeneity_lambda) * (1 / D**2) * (D - N)

    partial_derivative[condition2] = (1 - homogeneity_lambda) * (1 / D**2) * (D + N)

    partial_derivative[condition3] = -(1 - homogeneity_lambda) / D

    partial_derivative[condition4] = (1 - homogeneity_lambda) / D

    partial_derivative[condition5] = 0

 

    remaining_indices = torch.logical_not(torch.logical_or(torch.logical_or(torch.logical_or(condition1, condition2),

                                                                           torch.logical_or(condition3, condition4)),

                                                           condition5))

    partial_derivative[remaining_indices] = (1 - homogeneity_lambda) * (1 / D**2) * (

        D * torch.sign(status[remaining_indices] - average[remaining_indices]) -

        N * torch.sign(status[remaining_indices])

    )

 

    return partial_derivative

 

# Function to perform Homogeneity-driven weight update

def weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average, epsilon=1e-8):

    partial_derivative = calculate_partial_derivative(status, average, homogeneity_lambda, epsilon)

    delta = -homogeneity_learning_rate * partial_derivative

    return delta

 

# Function to train the model with Homogeneity-driven update

def train_homogeneity_driven(model, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average, total_iterations, batch_size):

    # Added batch_size as an argument

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    optimizer_homogeneity = optim.Adam(model.parameters(), lr=homogeneity_learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

    homogeneity_values = []

 

    homogeneity = 1.0

    total_start_time = time.time()

    iteration_counter = 1

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

 

            # --- Backpropagation Update ---

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

 

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # --- Homogeneity-driven Update ---

            status = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            weights_before_update = status.clone()

 

            # print("Model weights before homogeneity update (first 5 values of fc1.weight):")

            # print(model.fc1.weight.data[:5])

 

            similarity = calculate_similarity(status, average)

            homogeneity_lambda = (iteration_counter - 1) / iteration_counter

            homogeneity = (1 - homogeneity_lambda) * similarity + homogeneity_lambda * homogeneity

            homogeneity_values.append(homogeneity.item())

 

            delta = weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average)

 

            current_index = 0

            for param in model.parameters():

                param_size = param.nelement()

                update_value = delta[current_index: current_index + param_size].view_as(param.data)

                param.grad = -update_value

                current_index += param_size

 

            optimizer_homogeneity.step()

            optimizer_homogeneity.zero_grad()

 

            # print("\nModel weights after homogeneity update (first 5 values of fc1.weight):")

            # print(model.fc1.weight.data[:5])

 

            weights_after_update = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            distance = torch.norm(weights_before_update - weights_after_update)

            # print(f"Distance between weights before and after homogeneity update: {distance.item()}\n")

 

            average = (status + (epoch * len(train_loader) + batch_idx) * average) / (

                        epoch * len(train_loader) + batch_idx + 1)

 

            optimizer.zero_grad()

 

            # print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

            #       f"Backpropagation Update: {total_bp_update:.4f}, Homogeneity Update: {torch.sum(torch.abs(delta)).item():.4f}, "

            #       f"homogeneity_lambda: {homogeneity_lambda:.4f}, similarity: {similarity:.4f}, homogeneity: {homogeneity:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        #print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies, homogeneity_values

 

# Function to train the model with traditional backpropagation

def train_traditional(model, train_loader, val_loader, num_epochs, learning_rate, batch_size):

    # Added batch_size as an argument

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

 

    total_start_time = time.time()

    iteration_counter = 1  # Initialize iteration_counter here

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

 

            # Calculate total_bp_update for each iteration

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

            #       f"Backpropagation Update: {total_bp_update:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        #print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies

 

# Function to calculate accuracy

def calculate_accuracy(model, data_loader):

    model.eval()

    correct = 0

    total = 0

 

    with torch.no_grad():

        for images, labels in data_loader:

            images, labels = images.to(device), labels.to(device)

            outputs = model(images)

            _, predicted = torch.max(outputs.data, 1)

 

            total += labels.size(0)

            correct += (predicted == labels).sum().item()

 

    return 100 * correct / total

 

# -------------------- Hyperparameters --------------------

num_epochs = 10

learning_rate = 0.01

homogeneity_learning_rate = 0.01

input_size = 4  # Number of features in Iris dataset

hidden_size = 10

output_size = 3  # Number of classes in Iris dataset

batch_size = 32 # Added batch size definition

 

# ---------------------------------------------------------

 

# Load the Iris dataset

iris = load_iris()

X = iris.data

y = iris.target

 

# Split data into training, validation, and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)  # 0.25 x 0.8 = 0.2

 

# Scale the data

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_val = scaler.transform(X_val)

X_test = scaler.transform(X_test)

 

# Create custom dataset class for Iris data

class IrisDataset(Dataset):

    def __init__(self, X, y):

        self.X = torch.tensor(X, dtype=torch.float32)

        self.y = torch.tensor(y, dtype=torch.long)

 

    def __len__(self):

        return len(self.X)

 

    def __getitem__(self, idx):

        return self.X[idx], self.y[idx]

 

# Create data loaders

train_dataset = IrisDataset(X_train, y_train)

val_dataset = IrisDataset(X_val, y_val)

test_dataset = IrisDataset(X_test, y_test)

 

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True) # Updated to use batch_size variable

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False) # Updated to use batch_size variable

test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False) # Updated to use batch_size variable

 

training_samples_number = len(train_dataset)

total_iterations = math.ceil(training_samples_number / batch_size) * num_epochs # Updated to use batch_size variable

 

# homogeneity_learning_rate = (2* learning_rate) / (learning_rate * total_iterations + 1)

 

print("Hyperparameters and Calculated Values:")

print(f"  num_epochs: {num_epochs}")

print(f"  learning_rate: {learning_rate}")

print(f"  homogeneity_learning_rate: {homogeneity_learning_rate}")

print(f"  batch_size: {batch_size}")

print(f"  training_samples_number: {training_samples_number}")

print(f"  total_iterations: {total_iterations}")

print("\n")

 

# Initialize models

model_homogeneity = Net(input_size, hidden_size, output_size).to(device)

model_traditional = Net(input_size, hidden_size, output_size).to(device)

 

average_model = Net(input_size, hidden_size, output_size).to(device)

average = torch.cat([p.data.flatten() for p in average_model.parameters()]).detach()

 

# Train the models

print("Training with Homogeneity-driven update:")

results_homogeneity = train_homogeneity_driven(model_homogeneity, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average, total_iterations, batch_size)

 

print("\nTraining with traditional backpropagation:")

results_traditional = train_traditional(model_traditional, train_loader, val_loader, num_epochs, learning_rate, batch_size)

 

# Evaluate and plot results

train_losses_h, val_losses_h, train_accuracies_h, val_accuracies_h, homogeneity_values_h = results_homogeneity

train_losses_t, val_losses_t, train_accuracies_t, val_accuracies_t = results_traditional

 

print(f"\nFinal Test Accuracy (Homogeneity-driven): {calculate_accuracy(model_homogeneity, test_loader):.2f}%")

print(f"Final Test Accuracy (Traditional): {calculate_accuracy(model_traditional, test_loader):.2f}%")

 

plt.figure(figsize=(10, 5))

plt.plot(train_losses_h, label='Homogeneity-Driven Train Loss')

plt.plot(val_losses_h, label='Homogeneity-Driven Validation Loss')

plt.plot(train_losses_t, label='Traditional Train Loss')

plt.plot(val_losses_t, label='Traditional Validation Loss')

plt.title('Training and Validation Loss')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(train_accuracies_h, label='Homogeneity-Driven Train Accuracy')

plt.plot(val_accuracies_h, label='Homogeneity-Driven Validation Accuracy')

plt.plot(train_accuracies_t, label='Traditional Train Accuracy')

plt.plot(val_accuracies_t, label='Traditional Validation Accuracy')

plt.title('Training and Validation Accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(homogeneity_values_h, label='Homogeneity')

plt.title('Homogeneity Values over Iterations')

plt.xlabel('Iteration')

plt.ylabel('Homogeneity')

plt.legend()

plt.show()

 

# END OF THE CODE

=========================================

 

 

Samples of run:

-------------------------------------------------------

Sample [A-B-IRIS] 1:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 10

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 32

  training_samples_number: 90

  total_iterations: 30

 

Training with Homogeneity-driven update:

Total training time: 0.14 seconds

 

Training with traditional backpropagation:

Total training time: 0.09 seconds

 

Final Test Accuracy (Homogeneity-driven): 90.00%

Final Test Accuracy (Traditional): 83.33%

 

 

-------------------------------------------------------

Sample [A-B-IRIS] 2:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 10

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 64

  training_samples_number: 90

  total_iterations: 20

 

Training with Homogeneity-driven update:

Total training time: 0.12 seconds

 

Training with traditional backpropagation:

Total training time: 0.06 seconds

 

Final Test Accuracy (Homogeneity-driven): 90.00%

Final Test Accuracy (Traditional): 80.00%

EXPERIMENTS WITH HYBRID DYNAMIC LAMBDAs – MNIST DATASET

 

=========================================

# BEGINNING OF THE CODE

 

import torch

import torch.nn as nn

import torch.optim as optim

from torch.utils.data import DataLoader, random_split

from torchvision import datasets, transforms

import numpy as np

import matplotlib.pyplot as plt

import time

import math

 

# Device configuration

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

 

# Define the neural network model

class Net(nn.Module):

    def __init__(self):

        super(Net, self).__init__()

        self.fc1 = nn.Linear(28 * 28, 128)

        self.relu = nn.ReLU()

        self.fc2 = nn.Linear(128, 10)

 

    def forward(self, x):

        x = x.view(-1, 28 * 28)

        x = self.fc1(x)

        x = self.relu(x)

        x = self.fc2(x)

        return x

 

# Function to calculate Absolute Difference Similarity (ADS)

def calculate_similarity(status, average, epsilon=1e-8):

    num = torch.sum(torch.abs(status - average))

    den = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

    similarity = 1 - (num / den)

    return similarity

 

# Function to calculate partial derivative of Homogeneity

def calculate_partial_derivative(status, average, homogeneity_lambda, epsilon=1e-8):

    N = torch.sum(torch.abs(status - average))

    D = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

 

    condition1 = torch.logical_or(torch.logical_and(status > average, average > 0),

                                  torch.logical_and(status < average, average < 0))

    condition2 = torch.logical_or(torch.logical_and(status > 0, average < 0),

                                  torch.logical_and(status < 0, average > 0))

    condition3 = torch.logical_and(status == 0, average > 0)

    condition4 = torch.logical_and(status == 0, average < 0)

    condition5 = status == average

 

    partial_derivative = torch.zeros_like(status)

 

    partial_derivative[condition1] = (1 - homogeneity_lambda) * (1 / D**2) * (D - N)

    partial_derivative[condition2] = (1 - homogeneity_lambda) * (1 / D**2) * (D + N)

    partial_derivative[condition3] = -(1 - homogeneity_lambda) / D

    partial_derivative[condition4] = (1 - homogeneity_lambda) / D

    partial_derivative[condition5] = 0

 

    remaining_indices = torch.logical_not(torch.logical_or(torch.logical_or(torch.logical_or(condition1, condition2),

                                                                           torch.logical_or(condition3, condition4)),

                                                           condition5))

    partial_derivative[remaining_indices] = (1 - homogeneity_lambda) * (1 / D**2) * (

        D * torch.sign(status[remaining_indices] - average[remaining_indices]) -

        N * torch.sign(status[remaining_indices])

    )

 

    return partial_derivative

 

# Function to perform Homogeneity-driven weight update

def weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average, epsilon=1e-8):

    partial_derivative = calculate_partial_derivative(status, average, homogeneity_lambda, epsilon)

    delta = -homogeneity_learning_rate * partial_derivative

    return delta

 

# Function to train the model with Homogeneity-driven update

def train_homogeneity_driven(model, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    optimizer_homogeneity = optim.Adam(model.parameters(), lr=homogeneity_learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

    homogeneity_values = []

 

    homogeneity = 1.0

    total_start_time = time.time()

    iteration_counter = 1

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

 

            # --- Backpropagation Update ---

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

           

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # --- Homogeneity-driven Update ---

            status = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            weights_before_update = status.clone()

 

 #           print("Model weights before homogeneity update (first 5 values of fc1.weight):")

 #           print(model.fc1.weight.data[:5])

 

            similarity = calculate_similarity(status, average)

           

 

           

            beta = 0.999

           

            alpha = (total_iterations - iteration_counter * (1-beta)- beta) / (total_iterations - 1)

           

            homogeneity_lambda = alpha * ((iteration_counter - 1) / iteration_counter )

           

           

            homogeneity = (1 - homogeneity_lambda) * similarity + homogeneity_lambda * homogeneity

           

           

           

            homogeneity_values.append(homogeneity.item())

 

            delta = weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average)

 

            current_index = 0

            for param in model.parameters():

                param_size = param.nelement()

                update_value = delta[current_index: current_index + param_size].view_as(param.data)

                param.grad = -update_value

                current_index += param_size

 

            optimizer_homogeneity.step()

            optimizer_homogeneity.zero_grad()

 

   #         print("\nModel weights after homogeneity update (first 5 values of fc1.weight):")

   #        print(model.fc1.weight.data[:5])

 

            weights_after_update = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

            distance = torch.norm(weights_before_update - weights_after_update)

   #         print(f"Distance between weights before and after homogeneity update: {distance.item()}\n")

 

            average = (status + (epoch * len(train_loader) + batch_idx) * average) / (

                        epoch * len(train_loader) + batch_idx + 1)

 

            optimizer.zero_grad()

 

   #         print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

   #               f"Backpropagation Update: {total_bp_update:.4f}, Homogeneity Update: {torch.sum(torch.abs(delta)).item():.4f}, "

   #               f"homogeneity_lambda: {homogeneity_lambda:.4f}, similarity: {similarity:.4f}, homogeneity: {homogeneity:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies, homogeneity_values

 

# Function to train the model with traditional backpropagation

def train_traditional(model, train_loader, val_loader, num_epochs, learning_rate):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

 

    total_start_time = time.time()

    iteration_counter = 1  # Initialize iteration_counter here

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

 

            # Calculate total_bp_update for each iteration

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # print(f"Epoch [{epoch + 1}/{num_epochs}], Iteration [{batch_idx + 1}/{len(train_loader)}], "

            #       f"Backpropagation Update: {total_bp_update:.4f}")

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies

 

# Function to calculate accuracy

def calculate_accuracy(model, data_loader):

    model.eval()

    correct = 0

    total = 0

 

    with torch.no_grad():

        for images, labels in data_loader:

            images, labels = images.to(device), labels.to(device)

            outputs = model(images)

            _, predicted = torch.max(outputs.data, 1)

 

            total += labels.size(0)

            correct += (predicted == labels).sum().item()

 

    return 100 * correct / total

 

# !!!!!!!!!!!! Hyperparameters !!!!!!!!!!!!!!!!!!!!!

num_epochs = 10

learning_rate = 0.1

batch_size = 420

homogeneity_learning_rate = 0.1

 

# Load and split MNIST dataset

transform = transforms.ToTensor()

full_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

 

training_samples_number = int(0.7 * len(full_dataset))

total_iterations = math.ceil(training_samples_number / batch_size) * num_epochs

 

print("Hyperparameters and Calculated Values:")

print(f"  num_epochs: {num_epochs}")

print(f"  learning_rate: {learning_rate}")

print(f"  homogeneity_learning_rate: {homogeneity_learning_rate}")

print(f"  batch_size: {batch_size}")

print(f"  training_samples_number: {training_samples_number}")

print(f"  total_iterations: {total_iterations}")

print("\n")

 

train_size = int(0.7 * len(full_dataset))

val_size = int(0.15 * len(full_dataset))

test_size = len(full_dataset) - train_size - val_size

train_dataset, val_dataset, _ = random_split(full_dataset, [train_size, val_size, test_size])

 

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

 

# Initialize models

model_homogeneity = Net().to(device)

model_traditional = Net().to(device)

 

average_model = Net().to(device)

average = torch.cat([p.data.flatten() for p in average_model.parameters()]).detach()

 

# Train the models

print("Training with Homogeneity-driven update:")

results_homogeneity = train_homogeneity_driven(model_homogeneity, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average)

 

print("\nTraining with traditional backpropagation:")

results_traditional = train_traditional(model_traditional, train_loader, val_loader, num_epochs, learning_rate)

 

# Evaluate and plot results

train_losses_h, val_losses_h, train_accuracies_h, val_accuracies_h, homogeneity_values_h = results_homogeneity

train_losses_t, val_losses_t, train_accuracies_t, val_accuracies_t = results_traditional

 

print(f"\nFinal Test Accuracy (Homogeneity-driven): {calculate_accuracy(model_homogeneity, test_loader):.2f}%")

print(f"Final Test Accuracy (Traditional): {calculate_accuracy(model_traditional, test_loader):.2f}%")

 

plt.figure(figsize=(10, 5))

plt.plot(train_losses_h, label='Homogeneity-Driven Train Loss')

plt.plot(val_losses_h, label='Homogeneity-Driven Validation Loss')

plt.plot(train_losses_t, label='Traditional Train Loss')

plt.plot(val_losses_t, label='Traditional Validation Loss')

plt.title('Training and Validation Loss')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(train_accuracies_h, label='Homogeneity-Driven Train Accuracy')

plt.plot(val_accuracies_h, label='Homogeneity-Driven Validation Accuracy')

plt.plot(train_accuracies_t, label='Traditional Train Accuracy')

plt.plot(val_accuracies_t, label='Traditional Validation Accuracy')

plt.title('Training and Validation Accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(homogeneity_values_h, label='Homogeneity')

plt.title('Homogeneity Values over Iterations')

plt.xlabel('Iteration')

plt.ylabel('Homogeneity')

plt.legend()

plt.show()

 

# END OF THE CODE

=========================================

 

 

Samples of run:

-------------------------------------------------------

Samples [HYBRID-MNIST] 1-10:

-------------------------------------------------------

Final Test Accuracy (Traditional): 92.00%

beta = 0.500  Final Test Accuracy (Homogeneity-driven): 61.28%

beta = 0.800  Final Test Accuracy (Homogeneity-driven): 78.19%

beta = 0.900  Final Test Accuracy (Homogeneity-driven): 86.76%

beta = 0.950  Final Test Accuracy (Homogeneity-driven): 90.48%

beta = 0.980  Final Test Accuracy (Homogeneity-driven): 91.71%

beta = 0.990  Final Test Accuracy (Homogeneity-driven): 91.78%

beta = 0.995  Final Test Accuracy (Homogeneity-driven): 92.06%

beta = 0.999  Final Test Accuracy (Homogeneity-driven): 92.30% 

beta = 1.000   Final Test Accuracy (Homogeneity-driven): 91.66% 

beta = 1.010   Final Test Accuracy (Homogeneity-driven): 91.02%

 

 


 

Part II (Full Implementation)

(Complete option: when homogeneity gradient depends both on current and historical change; all lambda options are in one code)

 

 

=========================================

# BEGINNING OF THE CODE

 

import torch

import torch.nn as nn

import torch.optim as optim

from torch.utils.data import Dataset, DataLoader, random_split

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

import numpy as np

import matplotlib.pyplot as plt

import time

import math

 

# Import transforms from torchvision

from torchvision import transforms, datasets

 

# Device configuration

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

 

# Define the neural network model

class Net(nn.Module):

    def __init__(self, input_size, hidden_size, output_size):

        super(Net, self).__init__()

        self.fc1 = nn.Linear(input_size, hidden_size)

        self.relu = nn.ReLU()

        self.fc2 = nn.Linear(hidden_size, output_size)

 

    def forward(self, x):

        x = self.fc1(x)

        x = self.relu(x)

        x = self.fc2(x)

        return x

 

# Function to calculate Absolute Difference Similarity (ADS)

def calculate_similarity(status, average, epsilon=1e-8):

    num = torch.sum(torch.abs(status - average))

    den = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

    similarity = 1 - (num / den)

    return similarity

 

# Function to perform Homogeneity-driven weight update (Modified)

def weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average, gradients, epsilon=1e-8):

    N = torch.sum(torch.abs(status - average))

    D = epsilon + torch.sum(torch.abs(status) + torch.abs(average))

 

    # Calculate partial derivative based on conditions from Table 1

    partial_derivative = torch.zeros_like(status)

 

    condition1 = torch.logical_or(torch.logical_and(status > average, average > 0),

                                  torch.logical_and(status < average, average < 0))

    condition2 = torch.logical_or(torch.logical_and(status > 0, average < 0),

                                  torch.logical_and(status < 0, average > 0))

    condition3 = torch.logical_and(status == 0, average < 0)

    condition4 = torch.logical_and(status == 0, average > 0)

    condition5 = status == average

 

    partial_derivative[condition1] = (1 - homogeneity_lambda) / (D ** 2) * (D - N) + homogeneity_lambda * gradients[condition1]

    partial_derivative[condition2] = (1 - homogeneity_lambda) / (D ** 2) * (D + N) + homogeneity_lambda * gradients[condition2]

    partial_derivative[condition3] = (1 - homogeneity_lambda) / D + homogeneity_lambda * gradients[condition3]

    partial_derivative[condition4] = -(1 - homogeneity_lambda) / D + homogeneity_lambda * gradients[condition4]

    partial_derivative[condition5] = homogeneity_lambda * gradients[condition5]

 

    # Update gradients for the next iteration

    gradients = partial_derivative.clone()

 

    delta = -homogeneity_learning_rate * partial_derivative

    return delta, gradients

 

# Function to train the model with Homogeneity-driven update (Modified)

def train_homogeneity_driven(model, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average, total_iterations, batch_size, lambda_type, lambda_value):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    optimizer_homogeneity = optim.Adam(model.parameters(), lr=homogeneity_learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

    homogeneity_values = []

 

    # Initialization (t=0)

    homogeneity = 1.0

    gradients = torch.cat([torch.zeros_like(p.data.flatten()) for p in model.parameters()])

 

    total_start_time = time.time()

    iteration_counter = 1

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

            data, target = data.to(device), target.to(device)

 

            # 1. Compute Cross-Entropy Loss

            optimizer.zero_grad()

           

            # Flatten MNIST data if needed

            data = data.view(data.size(0), -1)  # Always flatten for MNIST

 

            output = model(data)

            loss = criterion(output, target)

 

            # 2. Update Parameters via Backpropagation

            loss.backward()

            optimizer.step()

 

            # Calculate total_bp_update for monitoring

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            # --- Homogeneity-driven Update ---

            status = torch.cat([p.data.flatten() for p in model.parameters()]).detach()

 

            # Calculate lambda based on lambda_type

            if lambda_type == 'fixed':

                homogeneity_lambda = lambda_value

            elif lambda_type == 'linear':

                homogeneity_lambda = (iteration_counter - 1) / total_iterations

            elif lambda_type == 'dynamic':

                homogeneity_lambda = (iteration_counter - 1) / iteration_counter if iteration_counter > 1 else 0

            else:

                raise ValueError("Invalid lambda_type. Choose from 'fixed', 'linear', or 'dynamic'.")

 

            # 3. Compute Updated Homogeneity (Formula 1 and 3)

            similarity = calculate_similarity(status, average)

            homogeneity = (1 - homogeneity_lambda) * similarity + homogeneity_lambda * homogeneity

            homogeneity_values.append(homogeneity.item())

 

            # 4. Update NN Parameters Using Homogeneity Gradients (Formula 4 and 5)

            delta, gradients = weights_update(status, homogeneity, homogeneity_learning_rate, homogeneity_lambda, average, gradients)

            current_index = 0

            for param in model.parameters():

                param_size = param.nelement()

                update_value = delta[current_index: current_index + param_size].view_as(param.data)

                param.data.add_(-homogeneity_learning_rate * update_value)  # Equation 5

                current_index += param_size

 

            # 5. Update Average Parameter Vector (Formula 2)

            average = (status + (epoch * len(train_loader) + batch_idx) * average) / (epoch * len(train_loader) + batch_idx + 1)

 

            # Reset optimizer state after homogeneity update

            optimizer.zero_grad()

 

            # --- End of Homogeneity-driven Update ---

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

           

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

 

                # Flatten MNIST data if needed

                data = data.view(data.size(0), -1) # Always flatten for MNIST

 

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies, homogeneity_values

 

# Function to train the model with traditional backpropagation

def train_traditional(model, train_loader, val_loader, num_epochs, learning_rate):

    criterion = nn.CrossEntropyLoss()

    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

 

    train_losses = []

    val_losses = []

    train_accuracies = []

    val_accuracies = []

 

    total_start_time = time.time()

    iteration_counter = 1

 

    for epoch in range(num_epochs):

        epoch_start_time = time.time()

        model.train()

        epoch_train_loss = 0

        epoch_train_correct = 0

        epoch_train_total = 0

 

        for batch_idx, (data, target) in enumerate(train_loader):

            iteration_counter += 1

 

            data, target = data.to(device), target.to(device)

            optimizer.zero_grad()

           

            # Flatten MNIST data if needed

            data = data.view(data.size(0), -1)  # Always flatten for MNIST

 

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

 

            # Calculate total_bp_update for each iteration

            total_bp_update = 0

            for param in model.parameters():

                if param.grad is not None:

                    total_bp_update += torch.sum(torch.abs(param.grad)).item()

 

            epoch_train_loss += loss.item()

            _, predicted = torch.max(output.data, 1)

            epoch_train_total += target.size(0)

            epoch_train_correct += (predicted == target).sum().item()

 

        epoch_train_loss /= len(train_loader)

        epoch_train_accuracy = 100 * epoch_train_correct / epoch_train_total

        train_losses.append(epoch_train_loss)

        train_accuracies.append(epoch_train_accuracy)

 

        model.eval()

        epoch_val_loss = 0

        epoch_val_correct = 0

        epoch_val_total = 0

 

        with torch.no_grad():

            for data, target in val_loader:

                data, target = data.to(device), target.to(device)

               

                # Flatten MNIST data if needed

                data = data.view(data.size(0), -1)  # Always flatten for MNIST

               

                output = model(data)

                loss = criterion(output, target)

                epoch_val_loss += loss.item()

 

                _, predicted = torch.max(output.data, 1)

                epoch_val_total += target.size(0)

                epoch_val_correct += (predicted == target).sum().item()

 

        epoch_val_loss /= len(val_loader)

        epoch_val_accuracy = 100 * epoch_val_correct / epoch_val_total

        val_losses.append(epoch_val_loss)

        val_accuracies.append(epoch_val_accuracy)

 

        epoch_end_time = time.time()

        epoch_time = epoch_end_time - epoch_start_time

 

        print(f"Epoch [{epoch + 1}/{num_epochs}], "

              f"Train Loss: {epoch_train_loss:.4f}, "

              f"Train Accuracy: {epoch_train_accuracy:.2f}%, "

              f"Val Loss: {epoch_val_loss:.4f}, "

              f"Val Accuracy: {epoch_val_accuracy:.2f}%")

 

        print(f"Epoch processing time: {epoch_time:.2f} seconds")

 

    total_end_time = time.time()

    total_training_time = total_end_time - total_start_time

    print(f"Total training time: {total_training_time:.2f} seconds")

 

    return train_losses, val_losses, train_accuracies, val_accuracies

 

# Function to calculate accuracy (Modified)

def calculate_accuracy(model, data_loader):

    model.eval()

    correct = 0

    total = 0

 

    with torch.no_grad():

        for data, target in data_loader:

            data, target = data.to(device), target.to(device)

           

            # Flatten MNIST data if needed

            data = data.view(data.size(0), -1)  # Always flatten for MNIST

           

            outputs = model(data)

            _, predicted = torch.max(outputs.data, 1)

 

            total += target.size(0)

            correct += (predicted == target).sum().item()

 

    return 100 * correct / total

 

# Hyperparameters

num_epochs = 16

learning_rate = 0.01

batch_size = 210

homogeneity_learning_rate = 0.005

input_size = 28 * 28  # For MNIST

hidden_size = 128

output_size = 10

 

# Homogeneity Hyperparameters

lambda_type = 'dynamic'  # Options: 'fixed', 'linear', 'dynamic'

lambda_value = 0.9  # Default value for lambda if lambda_type is 'fixed'

 

# Load and split MNIST dataset

transform = transforms.ToTensor()

full_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

 

training_samples_number = int(0.7 * len(full_dataset))

total_iterations = math.ceil(training_samples_number / batch_size) * num_epochs

 

print("Hyperparameters and Calculated Values:")

print(f"  num_epochs: {num_epochs}")

print(f"  learning_rate: {learning_rate}")

print(f"  homogeneity_learning_rate: {homogeneity_learning_rate}")

print(f"  batch_size: {batch_size}")

print(f"  training_samples_number: {training_samples_number}")

print(f"  total_iterations: {total_iterations}")

print(f"  lambda_type: {lambda_type}")

print(f"  lambda_value: {lambda_value}")  # Only relevant if lambda_type is 'fixed'

print("\n")

 

train_size = int(0.7 * len(full_dataset))

val_size = int(0.15 * len(full_dataset))

test_size = len(full_dataset) - train_size - val_size

train_dataset, val_dataset, _ = random_split(full_dataset, [train_size, val_size, test_size])

 

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

 

# Initialize models

model_homogeneity = Net(input_size, hidden_size, output_size).to(device)

model_traditional = Net(input_size, hidden_size, output_size).to(device)

 

average_model = Net(input_size, hidden_size, output_size).to(device)

average = torch.cat([p.data.flatten() for p in average_model.parameters()]).detach()

 

# Train the models

print("Training with Homogeneity-driven update:")

results_homogeneity = train_homogeneity_driven(model_homogeneity, train_loader, val_loader, num_epochs, learning_rate, homogeneity_learning_rate, average, total_iterations, batch_size, lambda_type, lambda_value)

 

print("\nTraining with traditional backpropagation:")

results_traditional = train_traditional(model_traditional, train_loader, val_loader, num_epochs, learning_rate)

 

# Evaluate and plot results

train_losses_h, val_losses_h, train_accuracies_h, val_accuracies_h, homogeneity_values_h = results_homogeneity

train_losses_t, val_losses_t, train_accuracies_t, val_accuracies_t = results_traditional

 

print(f"\nFinal Test Accuracy (Homogeneity-driven): {calculate_accuracy(model_homogeneity, test_loader):.2f}%")

print(f"Final Test Accuracy (Traditional): {calculate_accuracy(model_traditional, test_loader):.2f}%")

 

plt.figure(figsize=(10, 5))

plt.plot(train_losses_h, label='Homogeneity-Driven Train Loss')

plt.plot(val_losses_h, label='Homogeneity-Driven Validation Loss')

plt.plot(train_losses_t, label='Traditional Train Loss')

plt.plot(val_losses_t, label='Traditional Validation Loss')

plt.title('Training and Validation Loss')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(train_accuracies_h, label='Homogeneity-Driven Train Accuracy')

plt.plot(val_accuracies_h, label='Homogeneity-Driven Validation Accuracy')

plt.plot(train_accuracies_t, label='Traditional Train Accuracy')

plt.plot(val_accuracies_t, label='Traditional Validation Accuracy')

plt.title('Training and Validation Accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()

plt.show()

 

plt.figure(figsize=(10, 5))

plt.plot(homogeneity_values_h, label='Homogeneity')

plt.title('Homogeneity Values over Iterations')

plt.xlabel('Iteration')

plt.ylabel('Homogeneity')

plt.legend()

plt.show()

 

# END OF THE CODE

=========================================

 

Samples of run:

-------------------------------------------------------

Samples [COMPLETE-MNIST] 1:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 10

  learning_rate: 0.01

  homogeneity_learning_rate: 0.001

  batch_size: 210

  training_samples_number: 42000

  total_iterations: 2000

  lambda_type: dynamic

 

Training with Homogeneity-driven update:

Total training time: 90.20 seconds

 

Training with traditional backpropagation:

Total training time: 69.26 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.87%

Final Test Accuracy (Traditional): 96.77%

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 2:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 16

  learning_rate: 0.01

  homogeneity_learning_rate: 0.005

  batch_size: 210

  training_samples_number: 42000

  total_iterations: 3200

  lambda_type: dynamic

 

Training with Homogeneity-driven update:

Total training time: 142.09 seconds

 

Training with traditional backpropagation:

Total training time: 112.01 seconds

 

Final Test Accuracy (Homogeneity-driven): 97.07%

Final Test Accuracy (Traditional): 96.66%

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 3:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 16

  learning_rate: 0.01

  homogeneity_learning_rate: 0.005

  batch_size: 105

  training_samples_number: 42000

  total_iterations: 6400

  lambda_type: dynamic

 

Training with Homogeneity-driven update:

Total training time: 184.55 seconds

 

Training with traditional backpropagation:

Total training time: 123.41 seconds

 

Final Test Accuracy (Homogeneity-driven): 97.18%

Final Test Accuracy (Traditional): 96.60%

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 4:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 105

  training_samples_number: 42000

  total_iterations: 3200

  lambda_type: linear

 

Training with Homogeneity-driven update:

Total training time: 91.41 seconds

 

Training with traditional backpropagation:

Total training time: 61.28 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.60%

Final Test Accuracy (Traditional): 96.38%

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 5:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 1050

  training_samples_number: 42000

  total_iterations: 320

  lambda_type: linear

 

Training with Homogeneity-driven update:

Total training time: 53.81 seconds

 

Training with traditional backpropagation:

Total training time: 50.97 seconds

 

Final Test Accuracy (Homogeneity-driven): 97.27%

Final Test Accuracy (Traditional): 97.00%

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 6:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 2100

  training_samples_number: 42000

  total_iterations: 160

  lambda_type: linear

 

Training with Homogeneity-driven update:

Total training time: 53.00 seconds

 

Training with traditional backpropagation:

Total training time: 51.18 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.76%

Final Test Accuracy (Traditional): 96.32%

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 7:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 2100

  training_samples_number: 42000

  total_iterations: 160

  lambda_type: fixed    lambda_value: 0.9

 

Training with Homogeneity-driven update:

Total training time: 59.20 seconds

 

Training with traditional backpropagation:

Total training time: 55.72 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.67%

Final Test Accuracy (Traditional): 96.52%

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 8:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 2100

  training_samples_number: 42000

  total_iterations: 160

  lambda_type: fixed      lambda_value: 0.5

 

Training with Homogeneity-driven update:

Total training time: 52.22 seconds

 

Training with traditional backpropagation:

Total training time: 51.21 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.80%

Final Test Accuracy (Traditional): 96.73%

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 9:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 2100

  training_samples_number: 42000

  total_iterations: 160

  lambda_type: fixed      lambda_value: 0.1

 

Training with Homogeneity-driven update:

Total training time: 53.95 seconds

 

Training with traditional backpropagation:

Total training time: 51.14 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.81%

Final Test Accuracy (Traditional): 96.72%

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 10:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 8

  learning_rate: 0.01

  homogeneity_learning_rate: 0.001

  batch_size: 105

  training_samples_number: 42000

  total_iterations: 3200

  lambda_type: fixed     lambda_value: 0.9

 

Training with Homogeneity-driven update:

Total training time: 94.03 seconds

 

Training with traditional backpropagation:

Epoch processing time: 7.23 seconds

Total training time: 61.98 seconds

 

Final Test Accuracy (Homogeneity-driven): 96.57%

Final Test Accuracy (Traditional): 96.57%

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 11:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 16

  learning_rate: 0.01

  homogeneity_learning_rate: 0.005

  batch_size: 21000

  training_samples_number: 42000

  total_iterations: 32

  lambda_type: fixed      lambda_value: 0.9

 

Training with Homogeneity-driven update:

Total training time: 105.42 seconds

 

Training with traditional backpropagation:

Total training time: 106.84 seconds

 

Final Test Accuracy (Homogeneity-driven): 93.55%

Final Test Accuracy (Traditional): 93.02%

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 12:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 16

  learning_rate: 0.01

  homogeneity_learning_rate: 0.005

  batch_size: 5

  training_samples_number: 42000

  total_iterations: 134400

  lambda_type: fixed       lambda_value: 0.9

 

Training with Homogeneity-driven update:

Total training time: 2068.69 seconds

 

Training with traditional backpropagation:

Total training time: 785.92 seconds

 

Final Test Accuracy (Homogeneity-driven): 94.54%

Final Test Accuracy (Traditional): 94.47%

 

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 13:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 4

  learning_rate: 0.01

  homogeneity_learning_rate: 0.005

  batch_size: 2

  training_samples_number: 42000

  total_iterations: 84000

  lambda_type: fixed      lambda_value: 0.9

 

Training with Homogeneity-driven update:

Total training time: 1199.32 seconds

 

Training with traditional backpropagation:

Total training time: 377.90 seconds

 

Final Test Accuracy (Homogeneity-driven): 92.46%

Final Test Accuracy (Traditional): 90.46%

 

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 14:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 20

  learning_rate: 0.01

  homogeneity_learning_rate: 0.005

  batch_size: 42000

  training_samples_number: 42000

  total_iterations: 20

  lambda_type: fixed       lambda_value: 0.9

 

Training with Homogeneity-driven update:

Total training time: 131.89 seconds

 

Training with traditional backpropagation:

Total training time: 132.32 seconds

 

Final Test Accuracy (Homogeneity-driven): 91.40%

Final Test Accuracy (Traditional): 91.30%

 

 

-------------------------------------------------------

Samples [COMPLETE-MNIST] 15:

-------------------------------------------------------

Hyperparameters and Calculated Values:

  num_epochs: 4

  learning_rate: 0.01

  homogeneity_learning_rate: 0.01

  batch_size: 1

  training_samples_number: 42000

  total_iterations: 168000

  lambda_type: linear

 

Training with Homogeneity-driven update:

Total training time: 2556.08 seconds

 

Training with traditional backpropagation:

Total training time: 833.99 seconds

 

Final Test Accuracy (Homogeneity-driven): 89.85%

Final Test Accuracy (Traditional): 87.74%