pypi-simple — PyPI Simple Repository API client library¶
GitHub | PyPI | Documentation | Issues | Changelog
High-Level API¶
-
class
pypi_simple.
PyPISimple
(endpoint: str = 'https://pypi.org/simple/', auth: Optional[Any] = None, session: Optional[requests.sessions.Session] = None)[source]¶ A client for fetching package information from a Python simple package repository.
If necessary, login/authentication details for the repository can be specified at initialization by setting the
auth
parameter to either a(username, password)
pair or another authentication object accepted by requests.If more complicated session configuration is desired (e.g., setting up caching), the user must create & configure a
requests.Session
object appropriately and pass it to the constructor as thesession
parameter.A
PyPISimple
instance can be used as a context manager that will automatically close its session on exit, regardless of where the session object came from.Changed in version 0.8.0: Now usable as a context manager
Changed in version 0.5.0:
session
argument addedChanged in version 0.4.0:
auth
argument added- Parameters
endpoint (str) – The base URL of the simple API instance to query; defaults to the base URL for PyPI’s simple API
auth –
Optional login/authentication details for the repository; either a
(username, password)
pair or another authentication object accepted by requestssession – Optional
requests.Session
object to use instead of creating a fresh one
-
get_index_page
(timeout: Optional[Union[float, Tuple[float, float]]] = None) → pypi_simple.classes.IndexPage[source]¶ New in version 0.7.0.
Fetches the index/root page from the simple repository and returns an
IndexPage
instance.Warning
PyPI’s project index file is very large and takes several seconds to parse. Use this method sparingly.
- Parameters
timeout (Union[float, Tuple[float,float], None]) – optional timeout to pass to the
requests
call- Return type
- Raises
requests.HTTPError – if the repository responds with an HTTP error code
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
stream_project_names
(chunk_size: int = 65535, timeout: Optional[Union[float, Tuple[float, float]]] = None) → Iterator[str][source]¶ New in version 0.7.0.
Returns a generator of names of projects available in the repository. The names are not normalized.
Unlike
get_index_page()
andget_projects()
, this function makes a streaming request to the server and parses the document in chunks. It is intended to be faster than the other methods, especially when the complete document is very large.Warning
This function is rather experimental. It does not have full support for web encodings, encoding detection, or handling invalid HTML.
- Parameters
- Return type
Iterator[str]
- Raises
requests.HTTPError – if the repository responds with an HTTP error code
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
get_project_page
(project: str, timeout: Optional[Union[float, Tuple[float, float]]] = None) → Optional[pypi_simple.classes.ProjectPage][source]¶ New in version 0.7.0.
Fetches the page for the given project from the simple repository and returns a
ProjectPage
instance. ReturnsNone
if the repository responds with a 404. All other HTTP errors cause arequests.HTTPError
to be raised.- Parameters
- Return type
Optional[ProjectPage]
- Raises
requests.HTTPError – if the repository responds with an HTTP error code other than 404
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
get_project_url
(project: str) → str[source]¶ Returns the URL for the given project’s page in the repository.
-
get_projects
() → Iterator[str][source]¶ Returns a generator of names of projects available in the repository. The names are not normalized.
Warning
PyPI’s project index file is very large and takes several seconds to parse. Use this method sparingly.
Deprecated since version 0.7.0: Use
get_index_page()
orstream_project_names()
instead- Return type
Iterator[str]
- Raises
requests.HTTPError – if the repository responds with an HTTP error code
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
get_project_files
(project: str) → List[pypi_simple.classes.DistributionPackage][source]¶ Returns a list of
DistributionPackage
objects representing all of the package files available in the repository for the given project.When fetching the project’s information from the repository, a 404 response is treated the same as an empty page, resulting in an empty list. All other HTTP errors cause a
requests.HTTPError
to be raised.Deprecated since version 0.7.0: Use
get_project_page()
instead- Parameters
project (str) – The name of the project to fetch information on. The name does not need to be normalized.
- Return type
List[DistributionPackage]
- Raises
requests.HTTPError – if the repository responds with an HTTP error code other than 404
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
class
pypi_simple.
IndexPage
(projects: List[str], repository_version: Optional[str], last_serial: Optional[str])[source]¶ New in version 0.7.0.
A parsed index/root page from a simple repository
-
property
projects
¶ The project names listed in the index. The names are not normalized.
-
property
-
class
pypi_simple.
ProjectPage
(project: str, packages: List[pypi_simple.classes.DistributionPackage], repository_version: Optional[str], last_serial: Optional[str])[source]¶ New in version 0.7.0.
A parsed project page from a simple repository
-
property
project
¶ The name of the project the page is for
-
property
packages
¶ A list of packages (as
DistributionPackage
objects) listed on the project page
-
property
-
class
pypi_simple.
DistributionPackage
(filename: str, url: str, project: Optional[str], version: Optional[str], package_type: Optional[str], requires_python: Optional[str], has_sig: Optional[bool], yanked: Optional[str], metadata_digests: Optional[Dict[str, str]])[source]¶ Information about a versioned archive file from which a Python project release can be installed
Changed in version 0.5.0:
yanked
attribute addedChanged in version 0.9.0:
has_metadata
,metadata_url
, andmetadata_digests
attributes added-
property
filename
¶ The basename of the package file
-
property
url
¶ The URL from which the package file can be downloaded
-
property
project
¶ The name of the project (as extracted from the filename), or
None
if the filename cannot be parsed
-
property
version
¶ The project version (as extracted from the filename), or
None
if the filename cannot be parsed
-
property
package_type
¶ The type of the package, or
None
if the filename cannot be parsed. The recognized package types are:'dumb'
'egg'
'msi'
'rpm'
'sdist'
'wheel'
'wininst'
-
property
requires_python
¶ An optional version specifier string declaring the Python version(s) in which the package can be installed
-
property
has_sig
¶ Whether the package file is accompanied by a PGP signature file. This is
None
if the package repository does not report such information.
-
property
yanked
¶ If the package file has been “yanked” from the package repository (meaning that it should only be installed when that specific version is requested), this attribute will be a string giving the reason why it was yanked; otherwise, it is
None
.
-
property
metadata_digests
¶ If the package repository provides a Core Metadata file for the package, this is a (possibly empty)
dict
of digests of the file, given as a mapping from hash algorithm names to hex-encoded digest strings; otherwise, it isNone
-
property
has_metadata
¶ Whether the package file is accompanied by a Core Metadata file
-
property
metadata_url
¶ If the package repository provides a Core Metadata file for the package, this is the URL for that file; otherwise, it is
None
.
-
get_digests
() → Dict[str, str][source]¶ Extracts the hash digests from the package file’s URL and returns a
dict
mapping hash algorithm names to hex-encoded digest strings
-
classmethod
from_link
(link: pypi_simple.classes.Link, project_hint: Optional[str] = None) → pypi_simple.classes.DistributionPackage[source]¶ New in version 0.7.0.
Construct a
DistributionPackage
from aLink
on a project page.- Parameters
- Return type
-
property
-
pypi_simple.
PYPI_SIMPLE_ENDPOINT
: str = 'https://pypi.org/simple/'¶ The base URL for PyPI’s simple API
-
pypi_simple.
SUPPORTED_REPOSITORY_VERSION
: str = '1.0'¶ The maximum supported simple repository version (See PEP 629)
Low-Level Utilities¶
Parsing Simple Repository Pages¶
-
pypi_simple.
parse_repo_index_page
(html: Union[str, bytes], from_encoding: Optional[str] = None) → pypi_simple.classes.IndexPage[source]¶ New in version 0.7.0.
Parse an index/root page from a simple repository into an
IndexPage
. Note that thelast_serial
attribute will beNone
.- Parameters
- Return type
- Raises
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
pypi_simple.
parse_repo_index_response
(r: requests.models.Response) → pypi_simple.classes.IndexPage[source]¶ New in version 0.7.0.
Parse an index page from a
requests.Response
returned from a (non-streaming) request to a simple repository, and return anIndexPage
.- Parameters
r (requests.Response) – the response object to parse
- Return type
- Raises
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
pypi_simple.
parse_repo_project_page
(project: str, html: Union[str, bytes], base_url: Optional[str] = None, from_encoding: Optional[str] = None) → pypi_simple.classes.ProjectPage[source]¶ New in version 0.7.0.
Parse a project page from a simple repository into a
ProjectPage
. Note that thelast_serial
attribute will beNone
.- Parameters
project (str) – The name of the project whose page is being parsed
base_url (Optional[str]) – an optional URL to join to the front of the packages’ URLs (usually the URL of the page being parsed)
from_encoding (Optional[str]) – an optional hint to Beautiful Soup as to the encoding of
html
when it isbytes
(usually thecharset
parameter of the response’s Content-Type header)
- Return type
- Raises
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
pypi_simple.
parse_repo_project_response
(project: str, r: requests.models.Response) → pypi_simple.classes.ProjectPage[source]¶ New in version 0.7.0.
Parse a project page from a
requests.Response
returned from a (non-streaming) request to a simple repository, and return aProjectPage
.- Parameters
project (str) – The name of the project whose page is being parsed
r (requests.Response) – the response object to parse
- Return type
- Raises
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
pypi_simple.
parse_repo_links
(html: Union[str, bytes], base_url: Optional[str] = None, from_encoding: Optional[str] = None) → Tuple[Dict[str, str], List[pypi_simple.classes.Link]][source]¶ New in version 0.7.0.
Parse an HTML page from a simple repository and return a
(metadata, links)
pair.The
metadata
element is aDict[str, str]
. Currently, the only key that may appear in it is"repository_version"
, which maps to the repository version reported by the HTML page in accordance with PEP 629. If the HTML page does not contain a repository version, this key is absent from thedict
.The
links
element is a list ofLink
objects giving the hyperlinks found in the HTML page.- Parameters
base_url (Optional[str]) – an optional URL to join to the front of the links’ URLs (usually the URL of the page being parsed)
from_encoding (Optional[str]) – an optional hint to Beautiful Soup as to the encoding of
html
when it isbytes
(usually thecharset
parameter of the response’s Content-Type header)
- Return type
- Raises
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
class
pypi_simple.
Link
(text: str, url: str, attrs: Dict[str, Union[str, List[str]]])[source]¶ New in version 0.7.0.
A hyperlink extracted from an HTML page
-
property
text
¶ The text inside the link tag, with leading & trailing whitespace removed and with any tags nested inside the link tags ignored
-
property
url
¶ The URL that the link points to, resolved relative to the URL of the source HTML page and relative to the page’s
<base>
href value, if any
-
property
Streaming Parsers¶
-
pypi_simple.
parse_links_stream
(htmlseq: Iterable, base_url: Optional[str] = None, http_charset: Optional[str] = None) → Iterator[pypi_simple.classes.Link][source]¶ New in version 0.7.0.
Parse an HTML page given as an iterable of
bytes
orstr
and yield each hyperlink encountered in the document as aLink
object.This function consumes the elements of
htmlseq
one at a time and yields the links found in each segment before moving on to the next one. It is intended to be faster than bothparse_links()
andparse_repo_links()
, especially when the complete document is very large.Warning
This function is rather experimental. It does not have full support for web encodings, encoding detection, or handling invalid HTML. It also leaves CDATA list attributes on links as strings instead of converting them to lists.
- Parameters
htmlseq (Iterable[AnyStr]) – an iterable of either
bytes
orstr
that, when joined together, form an HTML document to parsebase_url (Optional[str]) – an optional URL to join to the front of the links’ URLs (usually the URL of the page being parsed)
http_charset (Optional[str]) – the document’s encoding as declared by the transport layer, if any; e.g., as declared in the
charset
parameter of the Content-Type header of the HTTP response that returned the document
- Return type
Iterator[Link]
- Raises
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
pypi_simple.
parse_links_stream_response
(r: requests.models.Response, chunk_size: int = 65535) → Iterator[pypi_simple.classes.Link][source]¶ New in version 0.7.0.
Parse an HTML page from a streaming
requests.Response
object and yield each hyperlink encountered in the document as aLink
object.See
parse_links_stream()
for more information.- Parameters
r (requests.Response) – the streaming response object to parse
chunk_size (int) – how many bytes to read from the response at a time
- Return type
Iterator[Link]
- Raises
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
Deprecated Functions¶
-
pypi_simple.
parse_simple_index
(html: Union[str, bytes], base_url: Optional[str] = None, from_encoding: Optional[str] = None) → Iterator[Tuple[str, str]][source]¶ Parse a simple repository’s index page and return a generator of
(project name, project URL)
pairsDeprecated since version 0.7.0: Use
parse_repo_index_page()
orparse_links_stream()
instead- Parameters
base_url (Optional[str]) – an optional URL to join to the front of the URLs returned (usually the URL of the page being parsed)
from_encoding (Optional[str]) – an optional hint to Beautiful Soup as to the encoding of
html
when it isbytes
(usually thecharset
parameter of the response’s Content-Type header)
- Return type
- Raises
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
pypi_simple.
parse_project_page
(html: Union[str, bytes], base_url: Optional[str] = None, from_encoding: Optional[str] = None, project_hint: Optional[str] = None) → List[pypi_simple.classes.DistributionPackage][source]¶ Parse a project page from a simple repository and return a list of
DistributionPackage
objectsDeprecated since version 0.7.0: Use
parse_repo_project_page()
instead- Parameters
base_url (Optional[str]) – an optional URL to join to the front of the packages’ URLs (usually the URL of the page being parsed)
from_encoding (Optional[str]) – an optional hint to Beautiful Soup as to the encoding of
html
when it isbytes
(usually thecharset
parameter of the response’s Content-Type header)project_hint (Optional[str]) – The name of the project whose page is being parsed; used to disambiguate the parsing of certain filenames
- Return type
List[DistributionPackage]
- Raises
UnsupportedRepoVersionError – if the repository version has a greater major component than the supported repository version
-
pypi_simple.
parse_links
(html: Union[str, bytes], base_url: Optional[str] = None, from_encoding: Optional[str] = None) → Iterator[Tuple[str, str, Dict[str, Union[str, List[str]]]]][source]¶ Parse an HTML page and return a generator of links, where each link is represented as a triple of link text, link URL, and a
dict
of link tag attributes (including the unmodifiedhref
attribute).Link text has all leading & trailing whitespace removed.
Keys in the attributes
dict
are converted to lowercase.Deprecated since version 0.7.0: Use
parse_repo_links()
instead- Parameters
base_url (Optional[str]) – an optional URL to join to the front of the URLs returned (usually the URL of the page being parsed)
from_encoding (Optional[str]) – an optional hint to Beautiful Soup as to the encoding of
html
when it isbytes
(usually thecharset
parameter of the response’s Content-Type header)
- Return type
Parsing Filenames¶
-
pypi_simple.
parse_filename
(filename: str, project_hint: Optional[str] = None) → Union[Tuple[str, str, str], Tuple[None, None, None]][source]¶ Given the filename of a distribution package, returns a triple of the project name, project version, and package type. The name and version are spelled the same as they appear in the filename; no normalization is performed.
The package type may be any of the following strings:
'dumb'
'egg'
'msi'
'rpm'
'sdist'
'wheel'
'wininst'
If the filename cannot be parsed,
(None, None, None)
is returned.Note that some filenames (e.g.,
1-2-3.tar.gz
) may be ambiguous as to which part is the project name and which is the version. In order to resolve the ambiguity, the expected value for the project name (modulo normalization) can be supplied as theproject_name
argument to the function. If the filename can be parsed with the given string in the role of the project name, the results of that parse will be returned; otherwise, the function will fall back to breaking the project & version apart at an unspecified point.- Parameters
- Return type
Changelog¶
v0.9.0 (2021-08-26)¶
Support PEP 658 by adding
has_metadata
,metadata_url
, andmetadata_digests
attributes toDistributionPackage
v0.8.0 (2020-12-13)¶
Support Python 3.9
PyPISimple
is now usable as a context manager that will close the session on exit
v0.7.0 (2020-10-15)¶
Drop support for Python 2.7, Python 3.4, and Python 3.5
DistributionPackage.has_sig
is nowNone
if the package repository does not report this informationAdded type annotations
Moved documentation from README file to a Read the Docs site
Added new methods to
PyPISimple
:get_index_page()
— Returns anIndexPage
instance with aprojects: List[str]
attribute plus other attributes for repository metadataget_project_page()
— Returns aProjectPage
instance with apackages: List[DistributionPackage]
attribute plus other attributes for repository metadatastream_project_names()
— Retrieves project names from a repository using a streaming request
New utility functions:
parse_repo_links()
— Parses an HTML page and returns a pair of repository metadata and a list ofLink
objectsparse_repo_project_page()
— Parses a project page and returns aProjectPage
instanceparse_repo_project_response()
— Parses arequests.Response
object containing a project page and returns aProjectPage
instanceparse_links_stream()
— Parses an HTML page as stream ofbytes
orstr
and returns a generator ofLink
objectsparse_links_stream_response()
— Parses a streamingrequests.Response
object containing an HTML page and returns a generator ofLink
objectsparse_repo_index_page()
— Parses a simple repository index/root page and returns anIndexPage
instanceparse_repo_index_response()
— Parses arequests.Response
object containing an index page and returns anIndexPage
instance
The following functions & methods are now deprecated and will be removed in a future version:
Support Warehouse’s X-PyPI-Last-Serial header by attaching the value to the objects returned by
get_index_page()
andget_project_page()
Support PEP 629 by attaching the repository version to the objects returned by
get_index_page()
andget_project_page()
and by raising anUnsupportedRepoVersionError
when a repository with an unsupported version is encountered
v0.6.0 (2020-03-01)¶
Support Python 3.8
DistributionPackage.sig_url
is now always non-None
, as Warehouse does not report proper values forhas_sig
v0.5.0 (2019-05-12)¶
The
PyPISimple
constructor now takes an optionalsession
argument which can be used to specify arequests.Session
object with more complicated configuration than just authenticationSupport for PEP 592;
DistributionPackage
now has ayanked
attribute
v0.4.0 (2018-09-06)¶
Publicly (i.e., in the README) document the utility functions
Gave
PyPISimple
anauth
parameter for specifying login/authentication details
v0.3.0 (2018-09-03)¶
When fetching the list of files for a project, the project name is now used to resolve ambiguous filenames.
The filename parser now requires all filenames to be all-ASCII (except for wheels).
v0.2.0 (2018-09-01)¶
The filename parser now rejects invalid project names, blatantly invalid versions, and non-ASCII digits.
RPM packages are now recognized.
v0.1.0 (2018-08-31)¶
Initial release
pypi-simple
is a client library for the Python Simple Repository API as
specified in PEP 503 and updated by PEP 592, PEP 629, and PEP 658.
With it, you can query the Python Package Index (PyPI)
and other pip-compatible repositories for a list of
their available projects and lists of each project’s available package files.
The library also allows you to query package files for their project version,
package type, file digests, requires_python
string, PGP signature URL, and
metadata URL.
Installation¶
pypi-simple
requires Python 3.6 or higher. Just use pip for Python 3 (You have pip, right?) to install
pypi-simple
and its dependencies:
python3 -m pip install pypi-simple
Example¶
>>> from pypi_simple import PyPISimple
>>> with PyPISimple() as client:
... requests_page = client.get_project_page('requests')
>>> pkg = requests_page.packages[0]
>>> pkg
DistributionPackage(filename='requests-0.2.0.tar.gz', url='https://files.pythonhosted.org/packages/ba/bb/dfa0141a32d773c47e4dede1a617c59a23b74dd302e449cf85413fc96bc4/requests-0.2.0.tar.gz#sha256=813202ace4d9301a3c00740c700e012fb9f3f8c73ddcfe02ab558a8df6f175fd', project='requests', version='0.2.0', package_type='sdist', requires_python=None, has_sig=None, yanked=None, metadata_digests=None)
>>> pkg.filename
'requests-0.2.0.tar.gz'
>>> pkg.url
'https://files.pythonhosted.org/packages/ba/bb/dfa0141a32d773c47e4dede1a617c59a23b74dd302e449cf85413fc96bc4/requests-0.2.0.tar.gz#sha256=813202ace4d9301a3c00740c700e012fb9f3f8c73ddcfe02ab558a8df6f175fd'
>>> pkg.project
'requests'
>>> pkg.version
'0.2.0'
>>> pkg.package_type
'sdist'
>>> pkg.get_digests()
{'sha256': '813202ace4d9301a3c00740c700e012fb9f3f8c73ddcfe02ab558a8df6f175fd'}