Tuesday, February 07, 2017

Gradual Packaging, with python - Part two

Gradual Packaging - allow packaging simple python code with very little effort. Later on (if needed) you can create a more polished package. Follow conventions, support the simple cases. If you need to go out of the simple cases, allow a path towards better packaging.

Evolution of your python code.

  1. in your ipython session messing around.
  2. paste it into badnamechoice.py -> (everything starts with a bad name in a single .py file)
  3. test_badnamechoice.py -> (then you add a test. Or did you do this first?)
  4. renamed.py -> (now you rename the file to something better)
  5. afolderoffiles/ -> (now there is a folder of files)
  6. add docs/ and .travisci CI files
  7. configure coverage testing
  8. configure an annoying mypy static type checking option
  9. add flakes testing config tweak
  10. support some weird debian derived distro
  11. appveyor CI added next to travisci
  12. pytest config (change because... ARG reasons).
  13. add a requirements.txt, and modify your setup file to have them too.
  14. remove testing stuff out of requirements.txt into requirements.dev.txt
  15. ... (config tweaks happen over the weeks and months)...
  16. ...
  17. Giant ball of 30 config file soup. -> (We have arrived at the modern repo!)

"Cool hackathon! What did you do?  - I packaged my python code, added test support, setup a nice git repo, a pypi page, a readthedocs page, configured travisci, search the internet for hours on how to upload a python package. Nice one - look at my finished app."
Get something out the door really quickly, and gradually improve the packaging. Or maybe your code isn't all that special and it's ok using the default packaging choices.

So, how do we support this workflow?

I started making a tool for doing the zero config releases. It's not done yet, but it is almost able to release itself. This follows on from a series of blog posts and experiments on:
My mission is to improve python packaging for newbies, those in the digital arts community. People using pygame. Also for 90% of python programmers who never release a python package, and yet work with python every day. This is based on feedback from different people in the python community who teach python programming, my own experience teaching it and mentoring teams. The scientific python community is another group which finds it hard to create python packages.

The humble experiment so far.

https://github.com/illume/pyrelease
 (still not sure if it's a good idea... but we will see)

Usage:
pyreleasase

File layouts (based on simple and common layouts often used):
-----
singlefile.py
----
mygame/game.py
data/image.png-----
singlefile.py
test_singlefile.py
-----
singlefile.py
tests/test_singlefile.py
-----

The basic steps at runtime are these.

  1. Gather facts, like author name and email. (much like ansible if you know it)
  2. Create setup.py, setup.cfg, MANIFEST.in files in a temp folder
  3. build sdist in that temp folder
  4. Upload files to pypi (eventually other release targets, including pyweek)
  5. tag a new version in git (if they are using git)

Tell the user what is happening.

The tool should also teach people about python packaging. It should allow you to see the generated setup.py file, setup.cfg, and MANIFEST.in. It should point to the Python Packaging User Guide. It should print out the browser for creating a pypi account, and tell people what config files they need to fill in. It shouldn't require git, or a github account, but support that workflow if they do.

Tell them why adding each bit of config is useful.

A single .py file is the smallest package of python code.

The simplest package of python code is a .py file. It can be copied into a project and it can work. Or it can be copied into the site-packages folder, and the module will work. People upload python code to share all the time - on websites all over the internet.

Pip supports installing from git repos, from hg repos, from alternative different package indexes, from requirements.txt files, and is adding support for .toml config files. It supports installing from folders on the file system. It supports a .egg file format. It supports a .whl file.

But it doesn't support the simplest, most elegant, oldest of python packages - the single .py file.

Where does the packaging metadata live?

Technically python doesn't need any metadata to install a simple .py file. I mentioned in the first two parts of this series various places where data can be gathered. ~/.pypirc files, .gitrc files, .hgrc files. It can find versions as git tags. It can find __author__, __license__, __version__ variables inside files. description, and longdescription are found at the top of the .py file in the docstring.


It can use what packaging data that has been added already. If you add a requirements.txt, then we should use that. If you add a README.rst, then use that.

Gradually add packaging information as you go, as you need it. Not when your packaging tool thinks you need it.


Why not template generators like sampleproject and cookiecutter?

These tools have their place. They are good if you want to tweak all the config in there, and you know how all the tools work. However, based on feedback from people, it's all too complex still. They want to a couple of .py files and that's it. Renaming code easily is important for some types of python code - especially when you don't even know where your experiment is going. Naming things is hard!
They came to python for the elegance, not the packaging config files.
But they still want to share code easily.

Where next with the pyrelease experiment?

First I want to get the tool into shape so it can at least release itself. I'm going to Gradually Package the humble pyrelease - by keeping it to one file from the beginning :)

It should support single file modules, packages, and also /data/ folders. The simple layouts that people use. As well it supports making a script automatically if it finds a main(). As well it finds dependencies by parsing the python code (I'll add requirements.txt later). So if you import pygame, click, flask etc... it adds them to install_requires in the setup.py.
  • I want to add logging of facts. eg. 'Found author: "Rene" in ~/.gitrc". Could not find author in ~/.hgrc
  • Suggest additions for missing things. eg. How to create a pypi account, how to make ~.pypirc, git.
  • Have to make uploading to pypi easier, especially when things go wrong.
  • thinking of removing the setuptools dependency... only thing I use it for is find_packages so far. I've been writing another more modern implementation of that anyway. Remove distutils dep too? [0]
  • Pynsist support (or py2exe, pyinstaller, whatever)
  • tests, and tests against test . pypi . python . org
  • "add setup files into this repo" for when the tool fails to be good enough.
  • notice telling people to go to packaging. python .org if they want more
  • decide on convention for screenshots (probably screenshots/ folder)
  • bitbucket, and better hg support.
  • pyweek upload support.
  • try releasing a few more games with it.
  • watch some other people try and use it.
  • keep blogging, and getting feedback.

​A nice thing so far, is that I was easily able to rename the project three times. Just by renaming files and folders. It was first called 'blabla.py' then 'package.py' then 'release.py', and finally 'pyrelease.py'. Normally you'd need to modify 10 config files when things happen.

Anyway... I'll try some more days on it and see how it turns out.


[0] Thomas Kluyver - an explanation of some philosophy on 'flit' the tool for minimal config distribution that doesn't use distutils or setuptools. https://github.com/takluyver/flit/pull/97#issuecomment-270984130

6 comments:

Scott Doucet said...

Hey, I really liked this post. I honestly couldn't agree more about the state of simple package management, and I think your tool would certainly be helpful for many people (me included).

I recently made a tool (trabBuild) to help me with simple packaging. In fact I used it to package and upload itself to PyPi right after I read this post!

My next step with the script was to implement something similar to what you already have here. I checked out your code and I think we might be able to merge some of this together and save a fair amount of work.

Links to code:
PyPi: https://pypi.python.org/pypi/trabBuild
GitHub: https://github.com/Duroktar/trabBuild

Thanks again!

Paul Moore said...

Nice article. I see where you're coming from with the idea that we should be able to publish single file modules. However, there are a few things that bother me still. First of all, while having version numbers, package descriptions and the like in the code is nice in the short term, IMO it's not sustainable in the longer term. First of all, you can't introspect the data from the module without executing the module, and that's a potential problem (less so for a developer tool than for pip, though, so probably OK for pyrelease). Secondly, the needs of your docstring and your long_description diverge fairly quickly - you may want ReST markup in the long description, but (mostly) plain text in your docstring.

Honestly, I'd rather the minimum be the code, plus a single config file containing the metadata. There might be a (tiny) bit of duplication, but the benefit of a clear separation between information about the module and the module itself, would be useful IMO.

I wonder - from what I've seen, flit is a nice lightweight packaging tool that is based on this "code plus a single metadata file" idea. So maybe having pyrelease just generate a flit.ini file for your project is what's needed here.

One thing I am a bit bothered about, though, is the "it's easy to rename" idea. I know where you're coming from (most of my modules start out named test.py!) but naming something for release is a big deal. You're reserving a name on PyPI forever. That really shouldn't be something you do without thinking.

I'd also be interested in where you'll go with regard to testing (which tool? use tox?) and CI (travis and appveyor *need* their own config files, you can't avoid that). And maybe even documentation (you really can't last long with "read the source file"). This sort of question is what prompted me to start the PyPA sampleproject - and it got dreadfully bogged down because there genuinely is no consensus on any of these things :-(

But regardless, thanks for working on this. "How do I set up my environment to build my shiny new idea in Python?" is a very common question, with no (or rather too many) good answer, and we really need something better.

Rene Dudfield said...

@Scot! Happy to collaborate on things. (your comment got flagged as spam for some reason)

Rene Dudfield said...

@Paul, thank you very much for the discussion.

A few random thoughts below. The most important and easy win I think is convincing a couple of tools to support using setup.cfg sections for configuration.

---

Many tools now use setup.cfg sections. The holdouts are mypy, pylint, and tox. But 11 or so other tools do apparently (flake, coverage, pytest, etc). I'd like to advocate for at least the python tools supporting setup.cfg.

It's quite easy to parse python files without executing them. As long as people don't do tricky things that is. If they do tricky things, then the tool should be able to fail with an error, and prompt people to use. __version__ __author__ and __license__ are quite common already.

I'm not entirely convinced myself that pyrelease should be a thing. Or if the 'gather meta data' idea is a good thing. However, I'd like to try and continue looking at how far the idea can go. Perhaps it would work nicely combined with ideas from flint, pypackage, sphinx-quickstart and others.

For version, they live in pypi already. At packaging time, the simplest way would be to just increment what you get from pypi. Of course __version__, and git tags could be supported too.

.py files don't really have a tool for managing them at the moment. It would seem a package management tool could support them. They technically require no meta data.

pylint and other tools use per file config. Might be nice to standardise this somehow. Not python constructs but comments.

Yeah, flint with one config file could be easily used by many.

Rene Dudfield said...

Yeah, releasing to pypi is something to consider. Pypi can remove packages right? I've removed some anyway... Cleaning up pypi of unused old packages might be useful. However, releasing code to internal package indexes(devpi etc) or just publishing to git/folder/web and referencing that in the requirements is popular already (for internal company use). I've seen teams which just copy their python files into place (ansible, fabric devops people), or just 'release' to their git repos and use links to them in the requirements.txt of other packages.

Better support for namespace packages will allow less pollution of the global pypi namespace. That's probably another topic to work on.

I like simple_setup (see below) which is a setup.py file which does the gather metadata trick. The benefit of this is that you can point pip at it to install. pbr uses a similar trick to make the setup.py minimal and keep all the logic within itself. I currently think this is a good idea.

Testing is a good topic, that deserves a lot more words. I recently tried to run all the tests from the 400 most downloaded packages from pypi. It's pretty much impossible. Even with tox.ini ones, there were plenty of failures - that didn't fail on their travisci. However, with a bit of work I managed to automatically find how to run tests with many packages. Even though there is no standard way to test packages (many don't work with setup.py test). For a pyrelease tool, I would probably pick a convention that is supported, and perhaps use some introspection again. If there is a test_bla.py in the folder, I'd try run it with one of the test runners (py.test), or if there's a test folder, do the same. Picking one tool most likely. If people need more than that, then they'll have to configure the test framework.

Running tests for the downstream packages that depend on your package is very useful for catching bugs. It amplifies the size of your test cases by a lot. I know some programs and packages do this already. eg. twisted some years ago asked python to run its test suite before release to lessen the unintentional regressions.

---

If all the python tools use setup.cfg sections, that will already clean up python repos by 3-10 files. There's already pull requests and patches for tox and mypy, but there is some resistance. They already use setup.cfg to configure some of the tools themselves. (see https://github.com/python/mypy/pull/2761 )



Here's a list of tools all aiming to simplify packaging and package management:

pbr http://docs.openstack.org/developer/pbr/
flit https://github.com/takluyver/flit/

(a setup.py which gather info from the environment) https://github.com/braingram/simple_setup

pypackage https://github.com/ccpgames/pypackage

pipenv https://github.com/kennethreitz/pipenv/

fades https://github.com/PyAr/fades


(It's interesting to me that some of them come from a user group, a games company, and a microservices group with hundreds of python packages, and the 'for humans' guy).

online guru said...

very interesting..nice post.thank you for sharing information..

python online training