(base) amin $>time aws s3 sync s3://amrootaws/share/ .
download: s3://amrootaws/share/Iris-NewDataAI.mp4 to ./Iris-NewDataAI.mp4
aws s3 sync s3://amrootaws/share/ . 1.08s user 1.28s system 26% cpu 8.873 total
1.08s user
This is the amount of CPU time spent in user mode (executing user-space operations).
It includes time spent running the AWS CLI commands and processing data in memory.
1.28s system
This is the amount of CPU time spent in kernel mode (system-level operations).
It includes time spent on I/O operations, such as reading from and writing to disk/network.
26% cpu
This indicates that the process used 26% of a single CPU core on average.
Since the total elapsed time (8.873s) is much longer than user + system time (1.08 + 1.28 = 2.36s), the command was I/O-bound rather than CPU-intensive.
The CPU was waiting on network or disk operations, rather than fully utilized.
8.873 total
This is the real (wall clock) time taken for the command to complete.
It includes the time spent waiting for data to transfer from S3, network latency, and disk writing speed.
(base) amin $>aws configure set default.s3.preferred_transfer_client crt
(base) amin $>time aws s3 sync s3://amrootaws/share/ .
download: s3://amrootaws/share/Iris-NewDataAI.mp4 to ./Iris-NewDataAI.mp4
aws s3 sync s3://amrootaws/share/ . 0.92s user 1.15s system 23% cpu 8.807 total
Here’s the before and after comparison based on your timing results:
Metric | Default S3 Transfer Client | CRT Transfer Client | Improvement
User Time (CPU in User Mode). | 1.08s | 0.92s |. 🔽 14.8% less CPU usage
System Time (CPU in Kernel Mode) | 1.28s | 1.15s | 🔽 10.2% less system CPU usage
CPU Utilization (%) | 26% | 23% | 🔽 Lower CPU load
Total Time (Wall Clock) | 8.873s | 8.807s | ⚡ ~0.7% faster execution
Slightly Faster Execution (~0.7% Reduction in Total Time)
The CRT transfer client does not significantly reduce total execution time in this case.
However, improvements might be more noticeable with large files, high-latency networks, or multi-threaded transfers.
Lower CPU Usage (~14.8% Less User CPU, 10.2% Less System CPU)
CRT uses optimized network calls, memory-efficient processing, and multi-threading.
It is especially useful in high-performance scenarios like parallel uploads/downloads.
Lower CPU Utilization (26% → 23%)
The CRT client offloads work more efficiently, allowing the CPU to remain less occupied.
✅ Larger Files (100MB+): CRT parallelizes multi-part uploads better, reducing total time.
✅ High-Latency Networks: If you're far from the S3 region, CRT's optimized networking reduces lag.
✅ Multiple Parallel Transfers: CRT handles multiple threads better than the default client.
✅ High Throughput Environments: Works best in 10 Gbps+ networks.
🚫 For Small Files / Low-Latency Networks: Performance improvements are minimal, as seen in your test.
CRT is more CPU-efficient, reducing load on your machine.
Total execution time improved slightly (~0.7%), but not significantly in this test.
If working with large files or high-latency connections, CRT will provide bigger gains.
scenario
ec2-1 is a client with outbound rule allowing 3306
ec2-2 is db server allowing inbound TCP from 0.0.0.0/0 on 3306
Change bind-address to 0.0.0.0 in my.cnf on ec2-2 and make sure port is 3306
still telnet fail to connect
test
netstat -lnptu
Add iptables entry for accepting connections on port 3306:
iptables -I INPUT -i eth0 -p tcp -m tcp --dport 3306 -j ACCEPT
test
iptables -L
test if services and agents are running
sudo systemctl status amazon-ssm-agent
sudo systemctl enable amazon-ssm-agent
sudo systemctl start amazon-ssm-agent
systemctl list-units --type=service
systemctl list-units --type=service | grep httpd
remove old host entry for ssh
ssh-keygen -R server.ip.addre.ss
from EC2-user home /home/ec2-user/
$> cd .ssh
$> ls
authorized_keys
$>cat authorizedkeys
$>curl http://169.254169.254/latest/meta-data/public-keys/0/openssh-key
if multiple keys are setup on EC2 0 is the first key
Add new public keys to SSH to EC2
$>ssh-keygen -t rsa
: newkeyname
:passphrase or not
$>ls
shows all gen files
$>cd ~ // go to EC2 user home where we have .ssh dir
//pipe contents of the new pub key file to authotized keys file of instance
$> cat newkeyname.pub >> .ssh/authorized_keys
---
more on Tune TCP parameters (window size, scaling) for this particular usecase
Model: For this specific use case of transferring data between Azure VM and AWS S3 with ~40ms latency,
Calculating Optimal TCP Window Size:
Bandwidth-Delay Product (BDP):
BDP = Network Bandwidth × Round Trip Time (RTT)
Target Bandwidth = 10 Gbps (theoretical max)
RTT = 40ms
BDP = 10 Gbps × 0.04s = 400 Mbits = 50 MB
Current Window Size Issues:
Default TCP window sizes (64KB-256KB) are too small for this BDP
Results in TCP window being full before ACKs return
Causes stop-and-wait behavior, underutilizing bandwidth
Recommended TCP Parameters:
TCP Window Size:
net.core.rmem_max = 67108864 # 64MB max TCP receive window
net.core.wmem_max = 67108864 # 64MB max TCP send window
net.ipv4.tcp_rmem = 4096 87380 67108864 # min/default/max
net.ipv4.tcp_wmem = 4096 87380 67108864
TCP Window Scaling:
net.ipv4.tcp_window_scaling = 1 # Enable window scaling
net.ipv4.tcp_timestamps = 1 # Enable timestamps
TCP Memory Allocation:
net.ipv4.tcp_mem = 67108864 67108864 67108864
TCP Congestion Control:
# Check available algorithms:
sysctl net.ipv4.tcp_available_congestion_control
# Recommend using BBR or CUBIC:
net.ipv4.tcp_congestion_control = bbr
TCP Fast Open:
net.ipv4.tcp_fastopen = 3
Implementation Steps:
Check current settings:
sysctl -a | grep net.core.[rw]mem
sysctl -a | grep net.ipv4.tcp_[rw]mem
Apply temporary changes:
sudo sysctl -w net.core.rmem_max=67108864
sudo sysctl -w net.core.wmem_max=67108864
# etc...
Make permanent changes in /etc/sysctl.conf:
sudo vi /etc/sysctl.conf
# Add all parameters
sudo sysctl -p
Verify changes:
sysctl net.ipv4.tcp_window_scaling
ss -info # Check actual window sizes in use
Monitoring and Validation:
Monitor TCP metrics:
# Watch for retransmissions
netstat -s | grep retransmit
# Monitor window size usage
ss -info | grep -i window
Check for TCP zero window events:
netstat -s | grep 'window'
Monitor throughput:
sar -n TCP 1
Additional Considerations:
Application Buffer Sizes:
Ensure application socket buffer sizes match system settings
Example in Python:
sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 67108864)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 67108864)
TCP Selective Acknowledgments (SACK):
net.ipv4.tcp_sack = 1
net.ipv4.tcp_dsack = 1
TCP Initial Window:
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_initial_cwnd = 10
Expected Improvements:
Better bandwidth utilization:
Window size matches BDP
Reduced wait time for ACKs
More efficient use of available bandwidth
Reduced latency impact:
Larger windows allow more in-flight data ( for private Conn )
Better handling of packet loss
More efficient congestion control
Better throughput stability:
Less stop-and-wait behavior
More consistent transfer rates
Better handling of network fluctuations
Monitor these changes carefully and adjust based on actual performance metrics. Consider testing different congestion control algorithms (CUBIC vs BBR) to find the optimal configuration for your specific network path.