Challenge Information
- Advent of Cyber Day 7
- THM link here
Explanation
Today’s challenge focus on analyzing log files with the help of bash commands such as cut
, uniq
and many more. As the file access.log
is too big, I will just provide the command to solve it.
1
2
3
4
5
6
7
8
9
10
11
head access.log
[2023/10/25:15:42:02] 10.10.120.75 sway.com:443 CONNECT - 200 0 "-"
[2023/10/25:15:42:02] 10.10.120.75 sway.com:443 GET / 301 492 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"
[2023/10/25:15:42:02] 10.10.120.75 sway.office.com:443 CONNECT - 200 0 "-"
[2023/10/25:15:42:02] 10.10.120.75 sway.office.com:443 GET / 200 20947 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"
[2023/10/25:15:42:02] 10.10.120.75 protection.office.com:443 CONNECT - 200 0 "-"
[2023/10/25:15:42:02] 10.10.120.75 protection.office.com:443 GET / 302 2227 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"
[2023/10/25:15:42:02] 10.10.120.75 login.microsoftonline.com:443 CONNECT - 200 0 "-"
[2023/10/25:15:42:03] 10.10.120.75 login.microsoftonline.com:443 GET /common/oauth2/authorize?client_id=80ccca67-54bd-44ab-8625-4b79c4dc7775&response_type=code%20id_token&scope=openid%20profile&state=OpenIdConnect.AuthenticationProperties%3Drrp8k1-pe98O6kX0s3AGDUUsWpVF8cppjCcRhEzoRPgztlutMh1KC9tcj5DJSvCu63hacn8k7570qdoYvGY8YmemM-A2YCfVJJFPSk1z1O9R6IZ8ONdjftRL8c0o5twJzRl_7_xMawX2O86Ko_so6w&response_mode=form_post&nonce=638338453227659455.MTYyY2RmZmItOTQ3MS00NGYwLThlNGItMWM4MTM3MjIxNWIzM2IxNTY1ZjItN2JjNi00NGNiLTg5ZTktYTcwOTlkY2VlMjFj&client-request-id=3f0994aa-3c3d-4cbb-9162-57e1f618aae9&redirect_uri=https%3A%2F%2Fprotection.office.com%2F&x-client-SKU=ID_NET461&x-client-ver=6.22.1.0 200 10410 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"
[2023/10/25:15:42:03] 10.10.120.75 platform.linkedin.com:443 CONNECT - 200 0 "-"
[2023/10/25:15:42:03] 10.10.120.75 platform.linkedin.com:443 GET / 404 6199 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"
Question 1: How many unique IP addresses are connected to the proxy server?
1
2
cat access.log | cut -d ' ' -f 2 | sort | uniq | wc -l
9
The first thing is to identified the IP address inside the access.log
file. Since the log file is seperated by spacing, we could use cut
to get the specific column which is 2
in this case. After getting all the IP addresses, we will need to user sort
and uniq
to remove all duplicate IP addresses. After removing all the duplicate IP addresses, the remaining left is to count the unique IP addresses using wc -l
command.
Question 2: How many unique domains were accessed by all workstations?
1
2
cat access.log | cut -d ' ' -f 3 | cut -d ':' -f 1 | sort | uniq | wc -l
111
The first thing is to identified the domains inside the access.log
file. Since the log file is seperated by spacing, we could use cut
to get the specific column which is 3
in this case. After getting the domains, it is combined together with different port which might give us extra result. To get only the domain, we will need to use cut
to get rid of the port number. After getting all the domain without the port number, we will need to user sort
and uniq
to remove all duplicate domain. After removing all the duplicate domain, the remaining left is to count the unique domain using wc -l
command.
Question 3: What status code is generated by the HTTP requests to the least accessed domain?
1
2
cat access.log | cut -d ' ' -f 3 | cut -d ':' -f 1 | sort | uniq -c | sort -n | head
78 partnerservices.getmicrosoftkey.com
Since the question is asking for the least accessed domain, we could just continue from our previous command by adding sort -n
for ascending order and head
for getting the first few result.
1
2
cat access.log | grep "partnerservices.getmicrosoftkey.com" | cut -d ' ' -f 4-6 | uniq
GET / 503
After getting the least accessed domain, we could use grep
command to help getting all the information and use cut
to get the specifc information that we need which is HTTP requests. After getting the request, we will just need to use uniq
to make sure there is only one HTTP request.
Question 4: Based on the high count of connection attempts, what is the name of the suspicious domain?
1
2
3
4
5
6
7
8
9
10
11
cat access.log | cut -d ' ' -f 3 | cut -d ':' -f 1 | sort | uniq -c | sort -nr | head
4992 www.office.com
4695 login.microsoftonline.com
1860 www.globalsign.com
1581 frostlings.bigbadstash.thm
1554 learn.microsoft.com
878 outlook.office365.com
850 c.bing.com
680 admin.microsoft.com
622 smtp.office365.com
606 docs.microsoft.com
This question is similar to previous but instead of getting the least accessed domain, we will be looking for high count of connection attempts. We could just modify sort -n
to sort -nr
for viewing in descending order. After getting the top 10 result, we noticed that one of the domain is suspicious, frostlings.bigbadstash.thm
.
Question 5: What is the source IP of the workstation that accessed the malicious domain?
1
2
cat access.log | grep "frostlings.bigbadstash.thm" | cut -d ' ' -f 2 | uniq
10.10.185.225
After getting the malicious domain, we could use grep
to only get information from the domain and use cut
to get the IP address only.
Question 6: How many requests were made on the malicious domain in total?
1
2
cat access.log | grep "frostlings.bigbadstash.thm" | cut -d ' ' -f 2 | uniq -c
1581 10.10.185.225
We could easily get the number of requests by using the previous command as all the information needed is there. We will just need to add uniq -c
to count how many times this requests were made.
Question 7: Having retrieved the exfiltrated data, what is the hidden flag?
1
2
cat access.log | grep "frostlings.bigbadstash.thm" | cut -d ' ' -f 5 | cut -d '=' -f 2 | base64 -d | grep -oE {.*}
72703959c91cb18edbefedc692c45204,SOC Analyst,THM{a_gift_for_you_awesome_analyst!}
To get the exfiltrated data, the first thing we need to do is get the data. We could easily get the data by using cut
. After getting the data, we could just decode the data with base64 encoding method. Since there are alot of data, we could use grep
to specify the things that we need. Since this final result is a flag and it has specific format such as {}
, we could make good use of it together with regex.
Things I learned from the challenge
- A lot of bash scripting to analyze the log file