Tutorial #1: Write your first PxL script

This tutorial series demonstrates how to write a PxL script to analyze the volume of traffic coming in and out of each pod in your cluster (total bytes received vs total bytes sent).

In Part 1 of this tutorial, we will write a very basic PxL script which simply queries a table of traced network connection data provided by Pixie's no-instrumentation monitoring platform.

The most basic PxL script

Create a new PxL file called my_first_script.pxl:

touch my_first_script.pxl

Open this file in your favorite editor and add the following lines. To copy the code, hover over the top-right corner of the code block and click the copy icon.

1# Import Pixie's module for querying data
2import px
3
4# Load the last 30 seconds of Pixie's `conn_stats` table into a Dataframe.
5df = px.DataFrame(table='conn_stats', start_time='-30s')
6
7# Display the DataFrame with table formatting
8px.display(df)

On line 2 we import Pixie's px module. This is Pixie's main library for querying data.

Pixie's scripts are written using the Pixie Language (PxL), a DSL that follows the API of the the popular Python data processing library Pandas. Pandas uses DataFrames to represent tables of data.

On line 5 we load the last 30 seconds of data from the conn_stats table into a DataFrame.

The conn_stats table contains high-level statistics about the connections (i.e. client-server pairs) that Pixie has traced in your cluster.

On line 8 we display the table using px.display().

Run this script using Pixie's Live CLI:

px live -f my_first_script.pxl

If you aren't familiar with Pixie's CLI tool, check out the Using the CLI guide.

Your CLI should output something similar to the following table:

Output of my_first_script.pxl in the Live CLI.

This PxL script outputs a table of data representing the last 30 seconds of the traced client-server connections in your cluster. Columns include:

time_: Timestamp when the data record was collected.
upid An opaque numeric ID that globally identifies a running process inside the cluster.
remote_addr: IP address of the remote endpoint.
remote_port: Port of the remote endpoint.
addr_family: The socket address family of the connection.
protocol: The protocol of the traffic on the connections.
role: The role of the process that owns the connection (client=1 or server=2).
conn_open: The number of connections opened since the beginning of tracing.
conn_close: The number of connections closed since the beginning of tracing.
conn_active: The number of active connections.
bytes_sent: The number of bytes sent to the remote endpoint(s).
bytes_recv: The number of bytes received from the remote endpoint(s).

If your output table is empty, try increasing the `start_time` value on line 5. If you modify the `start_time`, you'll need to save the script, exit the Live CLI using `ctrl+c`, and re-run the command in Step 3.

(Optional) Running px/schemas

You can find the conn_stats column descriptions as well as descriptions for all of the data tables provided by Pixie in the data table reference docs or by running the pre-built px/schemas script:

Exit the Live CLI using ctrl+c
Run the px/schemas script:

px live px/schemas

Use the keyboard arrows to scroll down through the output table until you reach conn_stats in the table_name column. You should see all of the columns available in the conn_stats table listed with their descriptions.

conn_stats table schema from the px/schemas script.

(Optional) More fun with DataFrames

DataFrame initialization supports end_time for queries requiring more precise time periods. If an end_time isn't provided, the DataFrame will return all events up to the current time.

1import px
2
3df = px.DataFrame(table='conn_stats', start_time='-60s', end_time='-30s')
4
5px.display(df)

You can drop columns using the df.drop() command.

1import px
2
3df = px.DataFrame(table='conn_stats', start_time='-30s')
4
5# Drop select columns
6df = df.drop(['conn_open', 'conn_close', 'bytes_sent', 'bytes_recv'])
7
8px.display(df)

Alternatively, you can use keep to return a DataFrame with only the specified columns. This can be used to reorder the columns in the output.

1import px
2
3df = px.DataFrame(table='conn_stats', start_time='-30s')
4
5# Keep only the select columns
6df = df[['remote_addr', 'conn_open', 'conn_close']]
7
8px.display(df)

If you only need a few columns from a table, use the DataFrame's select argument instead.

1import px
2
3# Populate the DataFrame with only the select columns from the `conn_stats` table
4df = px.DataFrame(table='conn_stats', select=['remote_addr', 'conn_open', 'conn_close'], start_time='-30s')
5
6px.display(df)

To filter the rows in the DataFrame by the role column:

1import px
2
3df = px.DataFrame(table='conn_stats', start_time='-30s')
4
5# Filter the results to only include rows whose `role` value equals 1 (connections traced on the client-side)
6df = df[df.role == 1]
7
8px.display(df)

If you want to see a small sample of data, you can limit the number of rows in the returned DataFrame to the first n rows (line 4).

1import px
2
3df = px.DataFrame(table='conn_stats', start_time='-30s')
4
5# Limit the number of rows in the DataFrame to 100
6df = df.head(100)
7
8px.display(df)

Conclusion

Congratulations, you built your first script!

In Tutorial #2, we will expand this PxL script to produce a table that summarizes the total amount of traffic coming in and out of each of the pods in your cluster.

This video summarizes the content in part 1 and part 2 of this tutorial:

The most basic PxL script (Optional) Running px/schemas (Optional) More fun with DataFrames Conclusion

This site uses cookies to provide you with a better user experience. By using Pixie, you consent to our use of cookies.