How to use Security Advisor with Skydive
by Kalman Meth, 31/07/2019The Security Advisor filters the flow data obtained from Skydive, performs a data transformation, and saves the information to an object store in GZIP compressed JSON encoding format. This data may then be used to perform various kinds of analyses for security, accounting, or other purposes. The Security Advisor is built on top of the Skydive Flow Exporter. Instructions to developers and high level architecture of the Flow Exporter can be found at Flow Exporter Overview. We present here extended instructions for users to deploy Skydive and the Security Advisor.
We use the term pipeline
to describe the sequence of operations performed on the data: classify, filter, transform, encode, store.
Deploy Skydive
You can either download a statically linked version of Skydive or you can build it on your own.
To download a pre-compiled statically linked version of Skydive, run the following command.
curl -Lo skydive https://github.com/skydive-project/skydive-binaries/raw/jenkins-builds/skydive-latest && chmod +x skydive && sudo mv skydive /usr/local/skydive/
To build your own version of Skydive, follow the instructions on Build Documentation to install prerequisites and prepare your machine to build the Skydive code. Enter the Skydive root directory and build the code.
cd $GOPATH/src/github.com/skydive-project/skydive
make
cp etc/skydive.yml.default /etc/skydive/skydive.yml
# adjust settings in skydive.yml, if desired
The skydive binary is created in $GOPATH/bin/skydive.
All-in-one version
sudo skydive allineone -c /etc/skydive/skydive.yml
Multi-node Deployment
Alternatively, run a Skydive analyzer on one host and a Skydive agent on each of your hosts.
On one machine:
sudo skydive analyzer -c /etc/skydive/skydive.yml
On each machine:
sudo skydive agent -c /etc/skydive/skydive.yml
Be sure to set the field analyzers
in the skydive.yml
to point to the analyzer.
Create Security Advisor (Pipeline) Configuration File
Start from the secadvisor.yml.default
in the secadvisor directory.
cp secadvisor.yml.default /etc/skydive/secadvisor.yml
The secadvisor.yml
is used as a parameter when running secadvisor from the command line.
The default secadvisor.yml
has the following fields:
host_id: ""
analyzers:
127.0.0.1:8082
pipeline:
analyzer:
subscriber_url: ws://127.0.0.1:8082/ws/subscriber/flow
subscriber_username:
subscriber_password:
classify:
# cluster_net_masks:
# - 10.0.0.0/8
# - 172.16.0.0/12
# - 192.168.0.0/16
filter:
# excluded_tags:
# - internal
# - other
# - ingress
# - egress
transform:
sa:
# exclude_started_flows: true
store:
type: s3
s3:
# -- client parames --
endpoint: http://127.0.0.1:9000
region: local
bucket: bucket
access_key: user
secret_key: password
# api_key: key
# iam_endpoint: https://iam.cloud.ibm.com/identity/token
object_prefix: logs
# -- bulk store params --
max_flows_per_object: 6000
max_seconds_per_object: 60
max_seconds_per_stream: 86400
max_flow_array_size: 100000
Description of fields in secadvisor.yml
file
Entries that begin with β#β are comments and are ignored. Be sure to include/exclude the appropriate configuration parameters.
Each instance of secadvisor should use a distinct host_id or should use ""
(the empty host id).
If a host_id (other than ""
) is in use by one instance of the Security Advisor (pipeline), that host_id may not be used by another instance of the Security Advisor (pipeline).
Set endpoint
to the object store endpoint URL, e.g. https://s3.us.cloud-object-storage.appdomain.cloud
.
Set the credentials used for authenticating to the object store service.
You can either use AWS-style HMAC keys by setting access_key
and secret_key
, or use IBM IAM OAuth by setting api_key
.
When using IBM IAM Oauth (when setting api_key
), the default IBM IAM authentication endpoint will be used: https://iam.ng.bluemix.net/oidc/token
, if none is specified.
You can customize this endpoint by setting a different iam_endpoint
to an alternative URL string.
Set bucket
to the name of the destination bucket where flows will be dumped.
If your object store service requires a region specification, set region
appropriately. This setting is not necessary when using IBM COS.
The subnets specified in cluster_net_masks
are used to determine whether a flow is internal, ingress, or egress.
Enter in the list all of the subnets that you consider to be inside your domain of security.
The types of excluded_tags
that are recognized are: internal, ingress, egress, other.
The internal
flag refers to flows that begin and end inside your domain (as defined by cluster_net_masks
).
The ingress
flag refers to flows whose source is ouside your domain and whose target is inside your domain.
The egress
flag refers to flows whose source is inside your domain and whose target is outside your domain.
The other
flag refers to flows whose source and target are both outside your domain.
For example, if you want to capture only those flows that go between your domain and outside your domain (ingress and egress), then specify excluded_tags
to include internal
and other
.
Be sure to set the parameters max_flow_array_size
, etc, to reasonable values.
The max_flow_array_size
specifies the maximum number of flows that will be stored in each iteration.
If max_flow_array_size
is set to 0 (or not set at all), then the number of flows that can be stored is 0, and hence no useful information will be saved in the object store.
This will result in an error.
The max_flows_per_object
parameter specifies the number of flows that may stored in a single object. If there are more flows found in a single iteration, then several objects will be created, each with up to max_flows_per_object
flows in them. In order to have all the flows stored in a single object, be sure that this parameter is set sufficiently large.
If max_flows_per_object
is set to 0 (or not set at all), then each object will be able to hold 0 flows; i.e. no flows will be able to be stored in any object.
This will result in an error.
The flow information is collected in groups of objects (called streams) determined by the max_seconds_per_stream
parameter.
All the objects generated within the number of seconds specified in the max_seconds_per_stream
parameter are collected under the same heading (stream) in the object url path.
If max_seconds_per_stream
is set to a very large number, then all of the flows captured in the current run of the Security Advisor will be included in the same collection (stream)
Furthermore, a single object may contain multiple statistics of the flows - all those collected within max_seconds_per_object
seconds. Thus if max_seconds_per_object
is set to 200 and the flows statistics are captured once a minute, then each object will contain 3 or 4 sets of statistics - whatever was collected within 200 seconds.
Setup Object Store
The Security Advisor saves data to an S3-type object store.
The parameters to access the object store must be provided in the secadvisor.yml
configuration file.
In case the user does not already have an object store, we show below how to create an object store for testing purposes.
You can either use AWS-style HMAC keys by setting access_key
and secret_key
, or use IBM IAM OAuth by setting api_key
.
Setup IBM COS
In IBM Cloud, Create an Object Store resource.
In the Object Store, create a bucket to hold the Skydive flow information. For simple testing, you can choose Single Site resiliency.
Look under Bucket Configuration to see the Public endpoint (url) where the bucket is accessed. This endpoint information needs to go into the secadvisor.yml
endpoint
field.
Go to the Service Credentials
panel and create a new credential. Click on View Credential
to get the details of the credential. The apikey needs to go into the secadvisor.yml
api_key
field.
In the secadvisor.yml
file, uncomment the iam_endpoint
field and set it to https://iam.cloud.ibm.com/identity/token
.
Setup Minio
For running tests on a local machine, it is possible to set up a local Minio object store.
For details, see the instructions in Flow Exporter Overview.
Deploy the Security Advisor pipeline
Build and run the pipeline in the secadvisor directory: skydive-flow-exporter/secadvisor
cd skydive-flow-exporter/secadvisor
make static
The binary is created in the directory: go/bin
Activate the Security Advisor
secadvisor /etc/skydive/secadvisor.yml
Note that it is possible to run several instances of the Security Advisor at the same time, each one with a different configuration file.
This is useful to capture different flows for different purposes and to perform different filtering (e.g. based on specified cluster_net_masks
and filter
parameters).
Generate and Capture Flows
Capture Flows
Via the Skydive WebUI setup captures and generate traffic which should result in the secadvisor pipeline sending flows to the ObjectStore.
To connect to the GUI, open a web browser to the address of the Skydive analyzer at port 8082. If running the browser on the same machine as the analyzer, then connect to localhost:8082. You should see the topology of your network, perhaps something like the following image.
Make sure you are on the Captures
view and press on Create
.
Fill in the Targets
fields by placing the cursor in the first Interface
field and then use the mouse to point to your network endpoint that you want to capture.
To capture a particular flow end-to-end, fill in the second Interface
field in a similar manner with the other network endpoint.
Then press the Start
button on the GUI.
Alternatively, you can start the capture of flows from the command line with a commond like the following:
skydive client capture create --gremlin "G.V().Has('Type', 'device', 'Name', 'eth0')" --type pcap
The command specifies to capture all flows that match the gremlin expression - in this case, all devices that have name βeth0β.
Generate Flows
Generate some network traffic on the interfaces you specified to capture. For example, run some iperf traffic between entities and capture the flow via the Skydive GUI.
To install iperf3
(on ubuntu/debian):
sudo apt-get install iperf3
Run iperf server:
iperf3 -s
Either on the same machine or on another machine, run the iperf client:
iperf3 -c <address-of-iperfserever> -t 1000
It is also possible to generate network traffic using the inject feature of Skydive, either from the command line or through the Skydive GUI.
Go the Generator
page and specify the source and destination nodes between which to generate the network traffic, as well as the characterization of the flow you want to generate.
Observe Security Advisor objects being created
Objects are created in the Object Store only if there is flow information to be saved.
The output log of secadvisor should show that an object is being sent to the object store about once per minute. Check your bucket in your object store to verify that you see a new object about once per minute. Stop the secadvisor or stop captures in the GUI to stop the creation of objects in the object store.
Content of Security Advisor objects
The Security Advisor saves flow information to an object store in GZIP compressed JSON encoding format.
Each object created by the Security Advisor contains the flows that were captured and filtered according to the parameters set in the secadvisor.yml
configuration file.
An example of entries in an unzipped object created by Security Advisor might be the following:
{
"UUID": "7521e85422cbc042",
"LayersPath": "Ethernet/IPv4/TCP",
"Version": "1.0.8",
"Status": "UPDATED",
"Network": {
"Protocol": "IPV4",
"A": "169.45.67.210",
"B": "169.44.184.135",
},
"Transport": {
"Protocol": "TCP",
"A": "32396",
"B": "4723"
},
"LastUpdateMetric": {
"ABPackets": 0,
"ABBytes": 112,
"BAPackets": 0,
"BABytes": 178,
"Start": 1551199625889,
"Last": 1551199655889
},
"Metric": {
"ABPackets": 2,
"ABBytes": 178,
"BAPackets": 3,
"BABytes": 244,
"Start": 0,
"Last": 0
},
"Start": 1551199609804,
"Last": 1551199631044,
"updateCount": 1,
"NodeType": "device"
}
For each flow we have the following fields and types, all collected in JSON encoding format.
UUID string
LayersPath string
Version string
Status string
FinishType string
Network *SecurityAdvisorFlowLayer
Transport *SecurityAdvisorFlowLayer
LastUpdateMetric *flow.FlowMetric
Metric *flow.FlowMetric
Start int64
Last int64
UpdateCount int64
NodeType string
Extend map[string]interface{}
UUID - unique ID of the flow.
LayersPath - represents the layers of the network stack; typical values might be Ethernet/IPv4/TCP
, Ethernet/IPv4/UDP/DNS
, Ethernet/IPv4/ICMPv4
.
Version - version of the Security Advisor pipeline being run.
Status - status of the flow; may be STARTED
, ENDED
, or UPDATED
.
FinishType - indicates how a flow ended; may be SYN_FIN
, SYN_RST
, Timeout
, OVERFLOW
, or empty if flow still on-going.
Network - provides network layer information; contains the following fields:
- Protocol - e.g. IPV4 or IPV6
- A - source address
- B - destination address
Transport - provides transport layer information; contains the following fields:
- Protocol - e.g. TCP or UDP
- A - source port
- B destination port
LastUpdateMetric - represents delta relative information only for this latest update from source (A
) to destination (B
).
It contains the following fields pertaining to the most recent update interval:
- ABPackets - number of data packets sent from
A
toB
- ABBytes - number of bytes sent from
A
toB
- BAPackets - number of data packets sent from
B
toA
- BABytes - number of bytes sent from
B
toA
- Start - start time of this measurement interval
- Last - end time of this measurement interval
Metric - represents cumulative information for this flow from source (A
) to destination (B
).
Start - time when this flow was first detected.
Last - time when last packet for this flow was detected.
UpdateCount - number of updates in this report.
NodeType - type of Skydive endpoint such as device
, switch
, tun
, bridge
, etc.
Extend - miscellaneous fields defined by the user, as described in the next section.
Adding miscellaneous fields to Security Advisor output via configuation
We added to the secadvisor pipeline the capability to extend the data output by specifying gremlin expressions with substitution in the yml file to generate additional fields in the output flow information. The basic syntax in the yml file looks like the following:
transform:
type: secadvisor
secadvisor:
exclude_started_flows: false
extend:
- VAR_NAME1=<gremlin expression with substitution strings>
- VAR_NAME2=<gremlin expression with substitution strings>
The gremlin expresion uses the golang template feature, where template expressions are enclosed in {{ }}.
The fields surrounded by {{ }} are taken as the names of fields in the flow information provided by the transform
(described in the previous subsection) and are replaced with their actual values, before evaluating the gremlin expression and placing the result in a new field which is added to the flow information.
For example, the gremlin expression may look like this:
- AA_Name=G.V().Has('RoutingTables.Src','\{\{.Network.A\}\}').Values('Host')
- BB_Name=G.V().Has('RoutingTables.Src','\{\{.Network.B\}\}').Values('Host')
The result is that the value of Network.A (in the above example: 169.45.67.210) is inserted in the gremlin expression, the gremlin expression is then evaluated (or obtained from a cache), and the resulting value (the name of the host holding the network interface) is then placed in the field AA_Name
under the Extend
field of the flow information.
The substitution string refers to fields that already exist in the data provided by the particular transform. If needed, be sure to put quotes around the substitution results. It is recommended to use only single quotes in the gremlin expression.
Multiple pipelines
It is possible to run multiple pipelines simultaneously. Prepare a separate
Troubleshooting common problems
If the secadvisor log shows a connection refused
error, verify that the proper address of the Skydive analyzer is specified under analyzers
.
If the secadvisor log shows a credentials error, verify that the credential fields (<access_key
, secret_key
> for minio, and <api_key
, iam_endpoint
> for IBM COS) are properly set in the secadvisor.yml
file.
If the secadvisor log shows an overflow and states that flows were discarded, check that max_flow_array_size
is defined to some reasonable positive number - at least the number of flows you expect to capture.
Verify that host_id is ""
.