How To: Setup Amazon SageMaker Lifecycle Configuration Scripts

Learn how to use SageMaker Lifecycle Configuration scripts to get a consistent developer experience

How To: Setup Amazon SageMaker Lifecycle Configuration Scripts
(Photo by Jana Ohajdova / Unsplash)

Introduction

Amazon SageMaker is AWS's platform for all things machine learning. It is a one-stop shop for every step in the ML lifecycle. You can wrangle with data, you can run model training jobs, and you can deploy your model out to the world. If you're doing ML on AWS, chances are you are using SageMaker.

Today I am focusing on a specific SageMaker service called SageMaker Studio. SageMaker Studio is a web-based IDE that allows users to edit and run their ML code using compute resources supplied by AWS. The platform is built on top of the Python Jupyter Lab suite, which means it supports the wildly popular Jupyter Notebook development experience.

SageMaker uses Docker images behind the scenes to virtualize the compute environment for its users. These images are built on the Linux OS with essential software pre-installed, including Python, Jupyter, and mainstream machine learning libraries. For basic ML applications, users can "get on with it" without worrying about setting up the environment.

HOWEVER, 99 times out of 100, your ML application won't work without some tweaks. Maybe you need to install a few more Python libraries. Or, maybe you are missing some OS dependencies. Or, maybe you are in a private VPC with no internet access and need to install Python libraries through an internal package repository. To do these things, you run Linux commands.

The issue is this: it quickly gets old if you need to run the same commands every time you start a SageMaker Studio session. This is where SageMaker LifeCycle Configuration (LCC) scripts come to the rescue.

Lifecycle Configuration scripts (LCCs)

SageMaker Studio Lifecycle Configuration scripts (that is a mouthful, calling it LCCs from now on) are bash scripts that run when your compute environment starts up. Think of it as EC2 User Data but for SageMaker. You can install software, setup configurations and tweak settings for your ideal ML coding environment.

There are different types of LCC scripts, depending on which SageMaker service you decide to use:

  • JupyterLab: For the JupyterLab service within SageMaker Studio
  • KernelGateway: For notebooks that run within JupyterLab (which runs within SageMaker Studio)
  • CodeEditor: For the Code Editor service within SageMaker Studio
  • JupyterServer: For legacy SageMaker Studio Classic and Notebook instances

Each service has slight differences in the Linux distro and the pre-installed software. Therefore, the contents of the LCC scripts vary between each type as well.

LCC script types for each SageMaker app

When you create a SageMaker environment in your AWS account, AWS manages the underlying resources and configurations into a single entity called a SageMaker domain. However, LCC scripts are one of the resources that are not bundled in a SageMaker domain. Instead, LCC scripts are stored in your AWS account, and you associate them to your SageMaker domain.

What can these scripts do?

LCC scripts are bash scripts that run in your AWS account. There is a lot of freedom here, you can do anything within the boundaries of your SageMaker IAM role. Rather, the question is "What would you typically want to do when setting up your ML environment?"

For example, you might want to install some extra OS dependencies and ML libraries:

#!/bin/bash

# Exit on error and print trace for each command
# https://explainshell.com/explain?cmd=set+-eux
set -eux

# Update and install some OS dependencies in apt
sudo apt update -y
sudo apt install -y antiword fonts-liberation

# Install some python ML libraries
pip3 install --upgrade pyarrow pycaret

Installing apt and pip packages

Or, you may need to set up access to a private package repository. In this example, we are first retrieving repo credentials from AWS Secrets Manager and then using that in our pip config:

#!/bin/bash

# Exit on error and print trace for each command
# https://explainshell.com/explain?cmd=set+-eux
set -eux

# Use Secrets Manager to retrieve Access Keys
export REPO_USER=$(aws secretsmanager get-secret-value --secret-id <SECRET_MANAGER_ARN> --region $AWS_REGION)
export REPO_TOKEN=$(aws secretsmanager get-secret-value --secret-id <SECRET_MANAGER_ARN> --region $AWS_REGION)

# Write to the pip configuration file
cat > /home/sagemaker-user/.pip/pip.conf << EOF
[global]
index-url = https://$REPO_USER:$REPO_TOKEN@<REPOSITORY_URL>/pypi/simple
EOF

# Make sure pip config file is used everywhere
sudo tee /etc/profile.d/pipconf.sh > /dev/null <<'EOF'
#!/bin/bash
export PIP_CONFIG_FILE=/home/sagemaker-user/.pip/pip.conf
EOF

Setting up pip to use your private repo

You can also set up your compute to automatically shut down if it is idle for a period of time. AWS provides an example script that you can run in LCC. You can see this script (and many more AWS examples) in the following repos:

GitHub - aws-samples/sagemaker-studio-apps-lifecycle-config-examples
Contribute to aws-samples/sagemaker-studio-apps-lifecycle-config-examples development by creating an account on GitHub.
GitHub - aws-samples/sagemaker-studio-lifecycle-config-examples
Contribute to aws-samples/sagemaker-studio-lifecycle-config-examples development by creating an account on GitHub.

Saving space

LCC script has a limit of 16384 characters, which isn't much for a bash script. To work around this, you can store your "real" bash script in S3, and just download them when running your LCC script:

#!/bin/bash
# This script downloads multiple lifecycle configuration scripts from S3 and loads them as one comprehensive script.

set -eux

echo "Run the pip setup script"
aws s3 cp <S3 object key> - | bash
echo "pip setup script Done!"

echo "Install pip packages"
aws s3 cp <S3 object key> - | bash
echo "Pip package install done!"

echo "Setup JupyterLab automatic shutdown script"
aws s3 cp <S3 object key> - | bash -s <Idle time threshold>
echo "JupyterLab automatic shutdown setup done!"

How to deploy your LCC scripts

Through the UI

If it's a one-off thing, you can use the AWS Console to edit and upload your LCC script. It's a manual and time-consuming process, but good enough if you're just experimenting with the feature.

The LCC edit UI on Amazon SageMaker

Once you create your script, make sure to associate it to your SageMaker domain:

Associating your script to your domain

Using APIs

Once you get the hang of LCCs, I recommend using the AWS API to deploy your scripts. You can manage your scripts programmatically and tie them into your DevOps processes to manage code versions.

For example, if you're using Python's boto3 library, the process would look something like this:

import boto3
import base64

# Define which SageMaker app you are using the script in
app_type = <JupyterLab|JupyterServer|CodeEditor|KernelGateway>

# Create boto3 clients
s3_client = boto3.client("s3")
sm_client = boto3.client("sagemaker")

# Download your script content from S3
lcc_content = s3_client.get_object(
                  Bucket=<BUCKET_NAME>, 
                  Key=<OBJECT_KEY>
              )["Body"].read()
            
# Upload the script to the domain
lcc_arn = self.sm_client.create_studio_lifecycle_config(
    StudioLifecycleConfigName=<LCC_NAME>,
    StudioLifecycleConfigContent=base64.b64encode(lcc_content).decode('utf-8'),
    StudioLifecycleConfigAppType=app_type
)["StudioLifecycleConfigArn"]


# Associate the script to the Sagemaker domain
# Firstly, get the current default user settings  
response = self.sm_client.describe_domain(DomainId=<SAGEMAKER_DOMAIN_ID>)  
default_user_settings = response['DefaultUserSettings']

# Next, set the script to run by default
default_user_settings[f"{app_type}AppSettings"]['DefaultResourceSpec']['LifecycleConfigArn'] = lcc_arn

# Also, add the script to the list of scripts that can be selected
default_user_settings[f"{app_type}AppSettings"]['LifecycleConfigArns'] += [lcc_arn]

# Lastly, update the sagemaker domain with your new settings
self.sm_client.update_domain(
    DomainId=<SAGEMAKER_DOMAIN_ID>,
    DefaultUserSettings=default_user_settings
)

For more information, check out the boto3 or AWS CLI documentation.

Conclusion

That's it! If you have any cool use cases for SageMaker LCC scripts, share them in the comments! I hope it helps, happy hacking!

Image Sources

Read more

今年読んだ本を振り返る 2024

今年読んだ本を振り返る 2024

早いもので、今年も年の瀬。皆さん、年末いかがお過ごしでしょうか。 いきなりですが、読書歴って人の性格とマイブームを如実に写し出すものだと思うんですね。というわけで、年末というキリのいい時期に、今年読んだ本を振り返って見ようと思います。 「ああ、年初はこんなこと考えてたなぁ」 「こんな本読んでたっけ!?」 思い出にふける自分用の記事になってます。 「最強!」のニーチェ入門 幸福になる哲学 (河出文庫) 去年末、國分 功一郎先生の「暇と退屈の倫理学」を読んでから哲学にハマってました。その流れを汲んで2024年一発目の本は哲学書。「暇と退屈の倫理学」の中でニーチェに触れ、彼がカッコいいと思いニーチェ入門書を購入…みたいな理由だったと思います。 数年前、飲茶先生の著書「史上最強の哲学入門」を読み、"とにかく分かりやすいなー!"と思ったのは覚えてます。哲学の入門書といえば飲茶先生です。 この「ニーチェ入門」も、著者と一般人の対話形式なので凄く分かりやすいです。哲学書によくある難しい言い回しはないですし、ページ数も少ないので一般人でも読みやすい本になってます。 入門書なんですが、

By Roland Thompson
Keychron Q11のすゝめ

Keychron Q11のすゝめ

年がら年中パソコンに張り付いてるネット民の同志の皆様、ごきげんよう。快適なパソコンライフ送ってますか? 私はと言うと、実はとても調子がいいのです! え?なぜかって?実はこいつのおかげでしてね・・・ Keychron Q11!! このスパッと2つに割れたキーボードを使い始めてから体の調子がすこぶる良くて、 * 長年悩まされた肩痛が治った * 姿勢が良くなった * 彼女ができた * 宝くじがあたった * この前、初めてホームランを打ったの! と、いいこと三昧なんですね。 そこで今回はこのKeychron Q11を皆さんに宣伝紹介したいと思います。 私とキーボード 我々の生活にスマホが普及して久しいですが、パソコンもまだまだ現役。読者の皆様は常日頃からパソコンを使う方が多いと思います。 総務省の情報通信白書によると、パソコンの世帯保有率は70%、インターネット利用率は50%前後を推移しています。まだまだ高い水準ですね。 かく言う私もパソコンがないと生きていけない生活を送ってます。 * まず、仕事がエンジニアなので、パソコンがないと飯が食えない * 続

By Roland Thompson