Machine learning offers powerful tools for detecting anomalies in network traffic. This post includes technical details, sample logs, and Linux-based scripts to help you get started with anomaly detection using machine learning techniques.


1. Supervised Learning with Decision Trees

Objective: Detect DDoS attacks by classifying normal vs. abnormal traffic using a decision tree model.

Sample Log Format:

{
"source_ip": "192.168.1.10",
"dest_ip": "203.0.113.1",
"packet_size": 512,
"timestamp": "2024-10-10T12:30:22Z",
"protocol": "TCP",
"duration": 0.5
}

Steps:

  • Collect logs using tcpdump:
tcpdump -i eth0 -w traffic_log.pcap
  • Parse logs for machine learning:
tshark -r traffic_log.pcap -T fields -e ip.src -e ip.dst -e frame.len > traffic_data.csv
  • Use Python and scikit-learn to train the decision tree model:
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load your dataset
data = pd.read_csv('traffic_data.csv')
X = data[['packet_size', 'duration']]
y = data['anomaly_flag'] # Normal (0) or Anomaly (1)

# Train the model
clf = DecisionTreeClassifier()
clf.fit(X, y)

# Predict
predictions = clf.predict([[800, 1.2], [512, 0.5]])

Testing the Detection:

  • Simulate a DDoS attack:
hping3 -S --flood -V -p 80 203.0.113.1

Monitor how the decision tree classifies these as anomalies based on the increase in packet size and frequency.


2. Unsupervised Learning with K-Means Clustering

Objective: Identify unknown anomalies using clustering of network behavior.

Sample Log Format:

{
"source_ip": "198.51.100.10",
"dest_ip": "192.0.2.2",
"connection_count": 120,
"timestamp": "2024-10-12T09:40:12Z",
"port": 22
}

Steps:

  • Extract features from logs using Bro/Zeek:
zeek -r traffic_log.pcap

The logs are available in conn.log:

cat conn.log | awk '{print $3, $5, $8}' > connection_data.csv
  • Run K-Means in Python:
from sklearn.cluster import KMeans
import pandas as pd

# Load connection data
data = pd.read_csv('connection_data.csv')
X = data[['connection_count', 'port']]

# Apply K-means
kmeans = KMeans(n_clusters=2) # Normal and Anomalous clusters
kmeans.fit(X)

# Check cluster assignment
clusters = kmeans.predict([[200, 22], [10, 443]])

Testing the Detection:

  • Inject unusual traffic patterns:
nmap -p 22 --max-retries 1 192.0.2.2

Observe how K-Means clusters these behaviors, flagging uncommon connection patterns as potential anomalies.


3. Anomaly Detection with LSTM Neural Networks

Objective: Detect unusual network traffic using Long Short-Term Memory (LSTM) networks for temporal data.

Sample Log Format:

{
"source_ip": "203.0.113.5",
"dest_ip": "192.168.0.10",
"packet_size": 600,
"timestamp": "2024-10-14T16:12:08Z",
"duration": 2.5,
"protocol": "UDP"
}

Steps:

  • Create sequential data from logs:
cat conn.log | awk '{print $1, $3, $5, $8}' > seq_data.csv
  • Train an LSTM model in Python using Keras:
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense

# Prepare data for LSTM
data = pd.read_csv('seq_data.csv')
X = np.array(data[['packet_size', 'duration']])

# Reshape for LSTM
X = X.reshape((X.shape[0], 1, X.shape[1]))

# Build LSTM Model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(1, 2)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

# Train
model.fit(X, y, epochs=200, verbose=0)

Testing the Detection:

  • Simulate an exfiltration attempt by sending large UDP packets over time:
dd if=/dev/zero bs=1024 count=100 | nc -u 203.0.113.5 12345

Monitor how the LSTM model identifies anomalies when sudden spikes in traffic volume occur.


Conclusion

By leveraging machine learning models like decision trees, K-Means clustering, and LSTM neural networks, you can automate the detection of both known and unknown anomalies in network traffic. These methods can drastically reduce response times and improve overall security posture. Experiment with the provided examples in your environment to see how machine learning transforms network anomaly detection in real-time.


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *