This tutorial series demonstrates how to write a PxL script to analyze the volume of traffic coming in and out of each pod in your cluster (total bytes received vs total bytes sent).
In Part 1 of this tutorial, we will write a very basic PxL script which simply queries a table of traced network connection data provided by Pixie's no-instrumentation monitoring platform.
The most basic PxL script
Create a new PxL file called my_first_script.pxl:
touch my_first_script.pxl
Open this file in your favorite editor and add the following lines. To copy the code, hover over the top-right corner of the code block and click the copy icon.
1# Import Pixie's module for querying data
2import px
3
4# Load the last 30 seconds of Pixie's `conn_stats` table into a Dataframe.
On line 2 we import Pixie's px module. This is Pixie's main library for querying data.
Pixie's scripts are written using the Pixie Language (PxL), a DSL that follows the API of the the popular Python data processing library Pandas. Pandas uses DataFrames to represent tables of data.
On line 5 we load the last 30 seconds of data from the conn_stats table into a DataFrame.
The conn_stats table contains high-level statistics about the connections (i.e. client-server pairs) that Pixie has traced in your cluster.
On line 8 we display the table using px.display().
Run this script using Pixie's Live CLI:
px live -f my_first_script.pxl
If you aren't familiar with Pixie's CLI tool, check out the Using the CLI guide.
Your CLI should output something similar to the following table:
Output of my_first_script.pxl in the Live CLI.
This PxL script outputs a table of data representing the last 30 seconds of the traced client-server connections in your cluster. Columns include:
time_: Timestamp when the data record was collected.
upid An opaque numeric ID that globally identifies a running process inside the cluster.
remote_addr: IP address of the remote endpoint.
remote_port: Port of the remote endpoint.
addr_family: The socket address family of the connection.
protocol: The protocol of the traffic on the connections.
role: The role of the process that owns the connection (client=1 or server=2).
conn_open: The number of connections opened since the beginning of tracing.
conn_close: The number of connections closed since the beginning of tracing.
conn_active: The number of active connections.
bytes_sent: The number of bytes sent to the remote endpoint(s).
bytes_recv: The number of bytes received from the remote endpoint(s).
If your output table is empty, try increasing the `start_time` value on line 5. If you modify the `start_time`, you'll need to save the script, exit the Live CLI using `ctrl+c`, and re-run the command in Step 3.
(Optional) Running px/schemas
You can find the conn_stats column descriptions as well as descriptions for all of the data tables provided by Pixie in the data table reference docs or by running the pre-built px/schemas script:
Exit the Live CLI using ctrl+c
Run the px/schemas script:
px live px/schemas
Use the keyboard arrows to scroll down through the output table until you reach conn_stats in the table_name column. You should see all of the columns available in the conn_stats table listed with their descriptions.
conn_stats table schema from the px/schemas script.
(Optional) More fun with DataFrames
DataFrame initialization supports end_time for queries requiring more precise time periods. If an end_time isn't provided, the DataFrame will return all events up to the current time.
5# Limit the number of rows in the DataFrame to 100
6df = df.head(100)
7
8px.display(df)
Conclusion
Congratulations, you built your first script!
In Tutorial #2, we will expand this PxL script to produce a table that summarizes the total amount of traffic coming in and out of each of the pods in your cluster.
This video summarizes the content in part 1 and part 2 of this tutorial: