Semgrep step configuration
You can scan your code repositories using Semgrep and ingest the results into STO.
For a quick introduction, go to the SAST code scans using Semgrep tutorial.
Important notes for running Semgrep scans in STO
-
This integration uses the Semgrep Engine, which is open-source and licensed under LGPL 2.1.
To run scans using a licensed version of Semgrep Code, add your Semgrep token in the Access token field.
-
STO Semgrep steps include the following rulesets by default:
Some rulesets include Pro rules that are available only with a paid version of Semgrep. For more information, go to the Semgrep Registry.
-
If you want to add trusted certificates to your scan images at runtime, you need to run the scan step with root access.
You can set up your STO scan images and pipelines to run scans as non-root and establish trust for your proxies using custom certificates. For more information, go to Configure STO to Download Images from a Private Registry.
-
The following topics contain useful information for setting up scanner integrations in STO:
Set-up workflows
Add a built-in SAST scanner (easiest)
Orchestration scans
Ingestion scans
Semgrep step configuration
The recommended workflow is to add a Semgrep step to a Security Tests or CI Build stage and then configure it as described below.
Scan
Scan Mode
- Orchestration Configure the step to run a scan and then ingest, normalize, and deduplicate the results.
- Ingestion Configure the step to read scan results from a data file and then ingest, normalize, and deduplicate the data.
Scan Configuration
You can use this setting to select the set of Semgrep rulesets to include in your scan:
- Default Include the following rulesets:
- No default CLI flags Run the
semgrep
scanner with no additional CLI flags. This setting is useful if you want to specify a custom set of rulesets in Additional CLI flags. - p/default Run the scan with the default ruleset configured for the Semgrep scanner.
- Auto only Run the scan with the recommended rulesets specific to your project.
- Auto and Ported security tools Include the following rulesets:
- Auto and Ported security tools except p/gitleaks
Target
Type
-
Repository Scan a codebase repo.
In most cases, you specify the codebase using a code repo connector that connects to the Git account or repository where your code is stored. For information, go to Configure codebase.
Target and variant detection
When Auto is enabled for code repositories, the step detects these values using git
:
- To detect the target, the step runs
git config --get remote.origin.url
. - To detect the variant, the step runs
git rev-parse --abbrev-ref HEAD
. The default assumption is that theHEAD
branch is the one you want to scan.
Note the following:
- Auto is not available when the Scan Mode is Ingestion.
- Auto is the default selection for new pipelines. Manual is the default for old pipelines, but you might find that neither radio button is selected in the UI.
Name
The identifier for the target, such as codebaseAlpha
or jsmith/myalphaservice
. Descriptive target names make it much easier to navigate your scan data in the STO UI.
It is good practice to specify a baseline for every target.
Variant
The identifier for the specific variant to scan. This is usually the branch name, image tag, or product version. Harness maintains a historical trend for each variant.
Workspace
The workspace path on the pod running the scan step. The workspace path is /harness
by default.
You can override this if you want to scan only a subset of the workspace. For example, suppose the pipeline publishes artifacts to a subfolder /tmp/artifacts
and you want to scan these artifacts only. In this case, you can specify the workspace path as /harness/tmp/artifacts
.
Ingestion File
The path to your scan results when running an Ingestion scan, for example /shared/scan_results/myscan.latest.sarif
.
-
The data file must be in a supported format for the scanner.
-
The data file must be accessible to the scan step. It's good practice to save your results files to a shared path in your stage. In the visual editor, go to the stage where you're running the scan. Then go to Overview > Shared Paths. You can also add the path to the YAML stage definition like this:
- stage:
spec:
sharedPaths:
- /shared/scan_results
Access Token
The access token to log in to the scanner. This is usually a password or an API key.
You should create a Harness text secret with your encrypted token and reference the secret using the format <+secrets.getValue("my-access-token")>
. For more information, go to Add and Reference Text Secrets.
Log Level
The minimum severity of the messages you want to include in your scan logs. You can specify one of the following:
- DEBUG
- INFO
- WARNING
- ERROR
Additional CLI flags
Use this field to run the semgrep
scanner with flags such as:
--severity=ERROR --use-git-ignore
With these flags, semgrep
considers only ERROR severity rules and ignores files included in .gitignore
.
Passing additional CLI flags is an advanced feature. Harness recommends the following best practices:
-
Test your flags and arguments thoroughly before you use them in your Harness pipelines. Some flags might not work in the context of STO.
-
Don't add flags that are already used in the default configuration of the scan step.
To check the default configuration, go to a pipeline execution where the scan step ran with no additional flags. Check the log output for the scan step. You should see a line like this:
Command [ scancmd -f json -o /tmp/output.json ]
In this case, don't add
-f
or-o
to Additional CLI flags.
Fail on Severity
Every STO scan step has a Fail on Severity setting. If the scan finds any vulnerability with the specified severity level or higher, the pipeline fails automatically. You can specify one of the following:
CRITICAL
HIGH
MEDIUM
LOW
INFO
NONE
— Do not fail on severity
The YAML definition looks like this: fail_on_severity : critical # | high | medium | low | info | none
Settings
You can use this field to specify environment variables for your scanner.
Additional Configuration
In the Additional Configuration settings, you can use the following options:
Advanced settings
In the Advanced settings, you can use the following options:
YAML pipeline example
The following pipeline example illustrates an orchestration workflow. It consists of a Semgrep step that scans a code repository and then ingests, normalizes, and deduplicates the results.
pipeline:
name: semgrep-orch-test
identifier: semgreporchtest
projectIdentifier: default
orgIdentifier: default
tags: {}
properties:
ci:
codebase:
connectorRef: YOUR_GIT_CONNECTOR_ID
repoName: YOUR_GIT_REPO_NAME
build: <+input>
stages:
- stage:
name: semgrep-orch
identifier: semgreporch
description: ""
type: SecurityTests
spec:
cloneCodebase: true
platform:
os: Linux
arch: Amd64
runtime:
type: Cloud
spec: {}
execution:
steps:
- step:
type: Semgrep
name: Semgrep_1
identifier: Semgrep_1
spec:
mode: orchestration
config: default
target:
type: repository
detection: auto
advanced:
log:
level: info