2023-06-28
Script to Python Package Using Poetry (And PyCharm)
- The task
- Steps for Package Creation
- Create Project Directory
- Open the Project in PyCharm
- Configure Poetry Virtual Environment
- Install Dependencies
- Configure PyCharm Interpreter
- Initialize Git Repository
- Create Package Structure
- Move Script and Files
- Create
__init__.py
- Update
pyproject.toml
- Add README.md file
- Test the Script
- Package the Project
- Publish the Package
- Versioning and Updates
The task
Let's assume that you have simple script that count tokens in provided text file. Below is the script that accepts a positional input argument, which is the file name, and can be run from the command-line interface (CLI). See also the note on How to count tokens?
#!/usr/bin/env python3
import argparse
import tiktoken
def num_tokens_from_string(string: str, encoding_name: str = "cl100k_base") -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
num_tokens_from_string(
"tiktoken is great!",
)
def count_tokens(file_path):
with open(file_path, "r") as file:
text = file.read()
return num_tokens_from_string(text)
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Count the number of tokens in a text file."
)
parser.add_argument("file", help="Path to the input text file")
args = parser.parse_args()
file_path = args.file
num_tokens = count_tokens(file_path)
print(f"Number of tokens: {num_tokens}")
In this script, the argparse
module is used to handle command-line arguments. The script defines a single positional argument, file
, which represents the file name of the input text file.
When the script is executed from the command line, it will parse the command-line arguments and retrieve the file path provided by the user. The count_tokens
function will then be called with the file path, and the number of tokens will be printed.
To run the script from the CLI, use the following command:
python script_name.py file_path
Replace script_name.py
with the actual name of your script file, and file_path
with the path to the input text file you want to analyze. The script will then tokenize the text file and print the number of tokens.
NOTE: you need
tiktoken
package installed before running the script. You can install it using pip:
pip install tiktoken
Steps for Package Creation
To create and publish a Python package based on the provided script, you can follow the steps below:
Create Project Directory
Start by creating a new directory for your project. You can choose an appropriate name for the directory.
- Initialize the Project with Poetry: Open your command-line interface and navigate to the project directory you created. Run the following command to initialize the project using Poetry:
poetry init
This command will prompt you to fill in information about your package, such as the package name, version, description, author details, and more. Fill in the required information as prompted.
Open the Project in PyCharm
Open PyCharm and select "Open" from the welcome screen or go to "File" > "Open" and choose the project directory you created.
Configure Poetry Virtual Environment
When opening the project in PyCharm for the first time, it will detect the presence of Poetry. You will be prompted to either allow PyCharm to create a Poetry virtual environment or create it manually. Select the option to create the virtual environment.
If you already have a Poetry virtual environment set up manually, you can skip this step.
Install Dependencies
In your command-line interface, navigate to the project directory if you're not already there. Run the following command to install the necessary dependencies using Poetry:
poetry install
This command will create a virtual environment and install the required packages specified in your project's pyproject.toml
file.
Configure PyCharm Interpreter
In PyCharm, go to "File" > "Settings" > "Project:
Select "Poetry Environment" and choose the existing local Poetry interpreter associated with your project's virtual environment. Click "OK" to apply the changes.
Initialize Git Repository
In your command-line interface, navigate to the project directory if you're not already there. Run the following command to initialize a Git repository:
git init
This will set up a new Git repository for version control.
At this point, you have set up the project structure, initialized Poetry, configured the virtual environment in PyCharm, installed dependencies, and initialized a Git repository. Now, you can proceed with packaging and publishing your Python script.
NOTE: you might want to add
.gitignore
file at this stage Minimal.gitignore
can be:
# Compiled Python files
__pycache__/
*.py[cod]
# Distribution / packaging
dist/
build/
*.egg-info/
*.egg
# Virtual environments
venv/
env/
# IDEs and editors
.idea/
Create Package Structure
Inside your project directory, create a package structure that follows Python's best practices. For example, you can create a directory named my_package
that will contain your script and other necessary files.
Move Script and Files
Move your script file and any other relevant files into the package directory (my_package
in this example).
Create __init__.py
Inside the package directory (my_package
), create an empty file named __init__.py
. This file is required to make the directory a Python package.
Update pyproject.toml
Open your project's pyproject.toml
file. Under the [tool.poetry]
section, add the script file and any additional files that need to be included in the package. For example:
[tool.poetry]
...
[tool.poetry.scripts]
my_script = 'my_package.my_script:main'
Replace my_script
with the desired command name for your script, and my_package.my_script:main
with the correct import path to your script and its main function.
Add README.md file
In the root of the project directory create README.md
and fill it with useful information. See also:writing_good_readme
NOTE: You can add some badges relate to your pypi package, e.g.:
![img](https://img.shields.io/pypi/v/package_name.svg)
![](https://img.shields.io/pypi/pyversions/package_name.svg)
![](https://img.shields.io/pypi/dm/package_name.svg)
Add LICENSE file
You can create a LICENSE file manually. Here's how you can do it:
- Create a new file in your project root directory named
LICENSE
. - Go to the MIT License template, copy the text.
- Paste the copied text into your
LICENSE
file. - Replace
[year]
with the current year and[fullname]
with your name or your organization's name. - Save the file.
Test the Script
Before publishing your package, it's essential to test your script to ensure it works as expected. You can execute the script locally to verify its functionality.
If you want to use pytest for testing add it as development dependency and install:
poetry add --group dev poetry
Package the Project
In your command-line interface, navigate to the project directory. Run the following command to create a distributable package:
poetry build
This command will generate a distributable package (e.g., a .tar.gz
file) in the dist
directory within your project.
Publish the Package
To publish your package, you can use a package index such as PyPI (Python Package Index). First, you need to create an account on PyPI if you haven't already. Once you have an account, run the following command to publish your package:
poetry publish
This command will guide you through the process of publishing your package to PyPI. You'll be prompted to enter your PyPI credentials and confirm the publication.
Note: Make sure your package has a unique name to avoid conflicts with existing packages on PyPI.
Versioning and Updates
When you make updates to your package, ensure to increment the version number in the pyproject.toml
file under the [tool.poetry.version]
section. This helps to track and manage different versions of your package.
That's it! You have now packaged and published your Python script using Poetry. Users can install your package using pip and use your script as a command-line tool.
Please note that publishing a package is a significant step, and it's essential to review and test your code thoroughly before sharing it with others.
Correcting metadata
authors = ["Krystian Safjan <ksafjan@gmail.com>"]
keywords = ["keyword1", "keyword2"]
homepage = "https://github.com/user/repo"
repository = "https://github.com/user/repo"
documentation = "https://github.com/user/repo"