Date
August 2023
Author
Sergio Salinas Monroy
The main objective of this project is to determine the machine learning (ML) models that can best detect encrypted data associated with ransomware attacks by primarily observing file fragments. Existing encrypted file detection approaches identify encrypted files by measuring their entropy. This approach can identify encrypted files with a small computational overhead and virtually no false negatives. Unfortunately, since there are many other types of high-entropy files, such as compressed files, only measuring a file’s entropy results in a high number of false positives. To tackle this challenge, the Wichita State University (WSU) team proposes two research problems to be completed in a 1-year period. The first problem focuses on investigating classification ML models that can distinguish between encrypted files and other high-entropy files. The studied ML models will only use information present in the file fragments, such as byte values, the order of the bytes, and other statistics about the file fragment contents. Although ML classification models can work well when there are sufficient labeled training samples from all classes, labeling enough files from users can be complex and time consuming. In the second problem, the WSU will use ML clustering algorithms that can detect encrypted files using unlabeled file fragments. The deliverables include ML model designs, trained ML models, performance evaluation results, and documentation.