Cloudfront

Use case : Cloudfront serving data from s3 origin to multiple clients , We need to know bandwidth consumed by each client to understand client costs

2019-04-20 20:00:00 LAX1 192.0.2.200 GET d111111abcdef8.cloudfront.net /images/pic.jpg 200 - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36

This line represents a single request made to a CloudFront distribution. Here is a breakdown of the example line:

2019-04-20 20:00:00: This is the date and time of the request.
LAX1: This is the edge location that served the request.
192.0.2.200: This is the IP address of the client that made the request.
GET: This is the HTTP method of the request.
d111111abcdef8.cloudfront.net: This is the host header. It represents the domain name of the CloudFront distribution.
/images/pic.jpg: This is the "cs-uri-stem" field in cloudfront it appears as "uri" . It represents the portion of the URI that identifies the requested object. In this case, the client has requested an object with the key "images/pic.jpg".
200: This is the HTTP status code returned to the client.
-: This is the referer field. In this case, there is no referer.
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36: This is the User-Agent header. It represents the client software that made the request.

So, if you want to find all requests for the object "images/pic.jpg", you could filter your log entries for lines where the "cs-uri-stem" field is "/images/pic.jpg". You could do this using a script, or a log analysis tool, depending on your needs and the volume of logs you are dealing with.

In a scenario where s3 bucket holds data for multiple clients its is better to have a client identifier in key , that would help in calculating consumption for each client

e.g images/clientID1/pic.jpg that would have cloudfont logs like so

2019-04-20 20:00:00 LAX1 192.0.2.200 GET d111111abcdef8.cloudfront.net /images/clientID1/pic.jpg 200 - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36

including a client identifier in the object key or path ("cs-uri-stem") would be beneficial for tracking and analyzing usage per client. This would allow you to easily filter the logs based on this identifier, giving you the ability to monitor and analyze data transfer per client.

Using above example, if objects were stored with paths like "/images/clientID1/pic.jpg", "/images/clientID2/pic2.jpg", and so on, you would be able to write a script or use a log analysis tool to filter the logs based on "/images/clientID1/", "/images/clientID2/", etc., which represents each client. This way, you could find all requests for a specific client, not just a specific object.

This is a common strategy in multi-tenant architectures where multiple clients or users are served from a single S3 bucket through CloudFront. By including the client identifier in the object key, you can easily segregate and track the data usage per client.

Remember to design the naming convention for your objects carefully, considering not only the current requirements but also potential future needs, to minimize the need for renaming or moving objects later. Also, remember that AWS S3 is eventually consistent for overwrite PUTS and DELETES, which could affect how quickly updates to objects are reflected in CloudFront, as CloudFront caches objects until the object's TTL (Time to Live) expires.

Sure! To execute an Athena query to get the sum of bytes transferred for a particular client, you first need to set up Athena to read your CloudFront logs. This requires creating a table that maps to the format of the logs.

Here's an example of how you could create such a table:

CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (

`date` DATE,

time STRING,

location STRING,

bytes BIGINT,

request_ip STRING,

method STRING,

host STRING,

uri STRING,

status INT,

referrer STRING,

user_agent STRING,

query_string STRING,

cookie STRING,

result_type STRING,

request_id STRING,

host_header STRING,

request_protocol STRING,

request_bytes BIGINT,

time_taken FLOAT,

xforwarded_for STRING,

ssl_protocol STRING,

ssl_cipher STRING,

response_result_type STRING,

http_version STRING,

file_status STRING,

encrypted_fields INT

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

LOCATION 's3://your_log_bucket/prefix/'

TBLPROPERTIES ( 'skip.header.line.count'='2' )

This script creates a table with columns that correspond to the CloudFront logs format. Replace `'s3://your_log_bucket/prefix/'` with the path to your logs in S3. The `'skip.header.line.count'='2'` part is there to ignore the two header lines that appear in CloudFront log files.

Now you can run a query to get the sum of bytes transferred for "clientID1". Here's an example:

SELECT SUM(bytes) as total_bytes

FROM cloudfront_logs

WHERE uri LIKE '/images/clientID1/%'

This query sums up the "bytes" column for all rows where the "uri" column starts with "/images/clientID1/", which represents requests for the client with the identifier "clientID1".

Please note that this is a simplified example and the actual query may need to be adjusted based on your specific log format and requirements. Make sure to test and adjust the query as needed to ensure it works correctly with your logs.

SELECT COUNT(*) as total_get_requests

FROM cloudfront_logs

WHERE uri LIKE '/images/clientID1/%' AND method = 'GET'

In this query:

COUNT(*) as total_get_requests counts the number of rows returned by the query.
uri LIKE '/images/clientID1/%' filters the rows to include only those where the uri column starts with /images/clientID1/, which represents requests for clientID1.
method = 'GET' further filters the rows to include only those where the method column is GET, which represents GET requests.

This will give you the total number of GET requests made to objects for clientID1.

Please note that the path (/images/clientID1/) in this query is based on your previous example and should be replaced with the actual path pattern that you're using to store objects for each client. Similarly, you'd replace clientID1 with the actual client ID that you're interested in.

Remember that CloudFront charges not only for data transfer but also for the total number of HTTP or HTTPS requests made. Therefore, when you're calculating the total cost of using CloudFront, be sure to take both of these factors into account.

Page updated

Google Sites

Report abuse