Good practices when defining the scope of a code scan

In this post, we have compiled a few good practices to keep in mind when scanning a code base with CAST Highlight in order to let you consume the most consistent software analytics possible, depending on your use case (software health, open source detection for license compliance or vulnerability checks, etc.).

As Highlight performs a code analysis at the file level and doesn’t particularly take into account the logical links or dependencies between these files, all files are considered equal and as being part of the application. In order to provide accurate and consistent results, especially from a Software Composition standpoint, you’ll have to take a few minutes to prepare your code scan scope by using the file/folder exclusion features of the Code Reader.

Code Scan Best Practices

In some specific contexts (Tech Due Diligence, M&A…), it is useful to refine the code scan scope to provide more focused insights from CAST Highlight. Download our Code Scan Best Practices deck that lists elements you should consider to refine the code scan scope based on the desired focus area: Software Health, Cloud Readiness, Software Composition Analysis or all of them.

Third Parties

If you want to identify open source or COTS packages, make sure they’re included in the folders you’ll scan (external libraries are generally grouped into a sub-folder named “third-party” or something similar, while the main code is often located under “src/main”). Also, make sure you include third-party binaries (e.g. JARs, DLLs) if you want to retrieve related license, vulnerability and obsolescence information in Highlight. For better results on Software Composition Analysis, it is strongly recommended to scan the build/deployed output in complement of the source code. Refer to our Code Scan Best Practices deck for further details.

Test & Sample files

Test classes should be excluded except if you want to scan them. But measuring software resiliency of your test files may be of poor interest, for instance. Test and sample files can also generate misidentification of OSS components during the Software Composition Analysis as they’re not really part of the application you’re scanning.

Generated code

Generated code (e.g. *.t.ds, *.flow.js) should be excluded as well as they’re automatically produced by the system and the development team can’t really manage software health of this aspect of the code.

Environment-specific files and folders

For more consistent results, SCM, build and deployment folders (e.g. .git, .svn, gradle, .circleci, .scannerwork, .azure, .vscode, etc.) or files (e.g. .yaml, .gitignore, .gitmodules, Makefile, .npmignore, .checkstyle, build.xml, gradlew… this list is not exhaustive) shouldn’t be part of the scope. For a more complete list of files and folders you should typically exclude from the scan scope, refer to this Github repository that lists these exclusions by technology stack.

Dependency files

If you want to get insights like CVEs (security vulnerabilities) on frameworks and dependencies whose physical files are not part of the folder you’re scanning, make sure that the dependency files (e.g. pom.xml, build.gradle, package.json, .vcsproj, etc.) are there too.

To the extreme opposite case, if you scan your C:\ drive and all the folders and files it contains, Highlight will systematically scan files with the 40+ technologies it supports and will try to consolidate the different insights (software health, cloud readiness, open source origin, security vulnerabilities…) from there. As you can easily understand, the few minutes you’ll spend in defining your application scope will be saved later when consuming our analytics.