Jump to content

User:Novem Linguae/Essays/Toolforge bot tutorial

From Wikipedia, the free encyclopedia

This is my Toolforge bot tutorial. I had some difficulty using MediaWiki and Wikitech tutorials to set up a bot on Toolforge. In my opinion, there is a big learning curve. These are my streamlined notes that will hopefully help the next person.

This tutorial is optimized for the operating system Windows and the programming language PHP. If you are using a different OS or language, you will need to change some of the steps.

Anywhere it says novem-bot, you should replace that with your Toolforge tool name. Anywhere it says novemlinguae, you should replace that with your wikitech username.

Do you really want to write a bot? Maybe a user script would be better?

[edit]

User scripts

[edit]
  • User scripts are usually easier to make than bots.
  • Use a user script when:
    • You want it to be triggered by a user rather than run every X minutes/hours/days.
    • You want to get up and running quickly
    • You know JavaScript
    • You want the edit to be associated with the editor triggering it, not a bot.
    • You can get the data you need with a couple of MediaWiki API queries and don't need a complex SQL database query.
    • You only need to make edits and/or display an interface to the user via a wiki.
    • You only need to make a couple edits at a time.
    • You don't need to use a library/dependency

Bots

[edit]
  • However there are cases when a bot is the better tool.
  • Use a bot when:
    • You want to do a chore every X hours/days (cron), repetitively, forever, rather than triggered by a user.
    • You don't mind taking awhile to get everything set up in ToolForge
    • You know a back end language such as PHP or Python
    • You want the edit to be associated with a bot, not with the editor triggering it.
    • You need to run complex SQL database queries and it would be impossible/inefficient to just use MediaWiki API queries
    • You want to make your own custom website that isn't nested inside a wiki. For example, XTools or some other web tool.
    • You plan on making dozens or more edits at a time.
    • You need to use a library/dependency

Apply for a ToolForge account

[edit]

Toolforge is Wikipedia's web hosting for technical contributors. Unfortunately it is quite different from cPanel web hosts and has a bit of a learning curve.

  1. Create a Wikimedia developer account. This is different from your normal Wikipedia SUL account.
  2. Create an SSH key and add it to your Wikitech account.
  3. Submit a Toolforge project membership request and wait for its approval.
    • Your request will be reviewed, and you will receive confirmation within a week. You will be notified through your Wikitech user account.
  4. Once you are added as a Toolforge member, you must log out and then log in again at http://toolsadmin.wikimedia.org/
  5. Create a new tool

Generate an SSH key

[edit]
  • SSH is a way to increase password security. You still use a password, but you must also keep a file with a key on your computer. This file is combined with your password to compute a super long and uncrackable password hash, and this long and uncrackable hash is what is sent to the server to log in.
  • Your SSH key will be needed for FTP and for shell/PuTTy.
  • I store my private key file at F:\Dropbox\Code\NovemBot\Gerrit SSH key\wikitech.ppk

FTP client

[edit]

WinSCP

[edit]
WinSCP
  • FTP is file transfer protocol. This is one way to get files to and from the server. You usually want to install a program that has a drag-n-drop interface, and you drag files from one side (your machine) to the other side (Toolforge), and vice versa.
  • You must use the FTP program WinSCP, and you must configure it a certain way. Other FTP programs will not work, as they do not have the ability to be configured the quirky way Toolforge wants.
  • protocol: SFTP
  • host: login.toolforge.org
  • user: novemlinguae
  • advanced ->
    • environment -> directories ->
      • remote directory: /data/project/novem-bot
      • local directory: F:\Dropbox\Code\NovemBot\
    • environment -> SFTP -> SFTP server ->
      • this step very important to become the right user (novem-bot)
      • sudo -u tools.novem-bot /usr/lib/sftp-server
    • ssh -> authentication -> private key file ->
      • F:\Dropbox\Code\NovemBot\Gerrit SSH key\wikitech.ppk
  • Name the site novem-bot, not novemlinguae. In case you get access to more repos later, you can make a different site for each one.
  • Connect. Enter password when prompted.

Shell

[edit]

PuTTy

[edit]
PuTTy
  • You need a way to send commands to the server. This is called shell, bash, console, SSH, or command line.
  • You can't just use a local shell window. Since you are talking to a remote server, you need a special program.
  • Download and install PuTTy.
  • host name: novemlinguae@login.toolforge.org
  • port: 22
  • connection: ssh
  • Saved Sessions -> novem-bot.toolforge.org
  • Connections -> SSH -> Auth -> Credentials -> Private key file for authentication -> F:\Dropbox\Code\NovemBot\Gerrit SSH key\wikitech.ppk
  • Session -> Save
  • Click "Open"
  • Enter password when prompted.
  • Once you're connected, you must type become novem-bot to change from your regular account to your project account. If you don't do this, you will have issues with your files belonging to the wrong owner, which will cause issues later.

Basic Linux shell commands

[edit]
  • In Linux, every file has a certain permission level, and this permission level affects who else on the server can see it, execute it, etc. And also what scripts can and can't be executed from certain places.
  • Here's some useful shell commands.
    • become novem-bot - Change from your username to your project name. Important for making sure files and folders you touch have the right owners.
    • cd .. - Navigate up one level.
    • cd folderName - Navigate down one level.
    • cd C:\Documents\ - Navigate to this location.
    • chmod 644 fileName - Change file permissions
    • ls -l or dir - Display directory contents.
    • mkdir folderName - create a directory
    • pwd - Print working directory. Shows you where you are.
    • rm -rf folderName - Delete file/folder and all its contents.
    • take fileName - take a file that is assigned to a different owner, and make you the owner, if able
  • Here's some stuff that is installed and can be accessed in shell

Write and test your bot

[edit]

Favor localhost

[edit]
  • Do as much of your development and testing in a localhost development environment as you can.
    • That way you don't have to re-upload your changed files via FTP as you tweak and test code, saving time.
    • For web development, you can use a program like XAMPP to execute PHP files locally, e.g. https://localhost/
    • The Wikipedia API doesn't care what calls it or from where, making localhost development easy.
    • The SQL replica database does care. That needs to be called from Toolforge servers only. Localhost doesn't work.

Protect your passwords

[edit]
  • The idea behind allowing all Toolforge files to be viewable by other Toolforge accounts by default is to make it easier to fork bots/tools when someone goes inactive.
  • But this means your passwords/secrets files are not protected by default. Be sure to chmod 0600 these.
  • Related: phab:T337140, phab:T334578

Special file permissions (744) for scripts that write/modify other files

[edit]
  • Let's say you want a specific file (your bot file, for example), when executed, to write a .txt file on the server with some data.
  • You must change the permissions of the file that is making the changes, from 644 to 744. This adds the "execute" permission to it.
  • This idea may be important later for getting your bot to run. If you decide to use an .sh file to execute your main bot file, then the .sh file will need 744 permission.

Use a bot framework

[edit]
  • Whatever language you're writing your bot in, you'll probably want to pick a framework (external library) specifically designed for logging into Wikipedia and using its API.
  • For PHP, I use the ancient framework botclasses.php. It's not modern, but it fits nicely in one file.
  • One file means I can simply copy/paste the code into my repo, then do <?php include('botclasses.php');, and now I can log in to Wikipedia and execute API commands with a lot less code. Example botclasses.php code:
<?php

include('botclasses.php');

// Log in
$wp = new wikipedia();
$wp->http->useragent = '[[en:User:NovemBot]] task A, owner [[en:User:Novem Linguae]], framework [[en:User:RMCD_bot/botclasses.php]]';
$wp->login('usernameGoesHere', 'passwordGoesHere');

// Get page wikicode
$pageTitleIncludingNamespace = 'User:NovemBot/userlist.js';
$oldWikicode = $wp->getpage($pageTitleIncludingNamespace);

// Edit page wikicode
$pageTitleIncludingNamespace = 'User:NovemBot/userlist.js';
$newWikicode = '//Test!'
$editSummary = 'Update list of users who have permissions (NovemBot Task A)';
$wp->edit(
	$pageTitleIncludingNamespace,
	$newWikicode,
	$editSummary
);

Bot passwords

[edit]
  • Best practice is not to use your bot's actual username and password when logging in.
  • Instead, create a bot password at Special:BotPasswords just for that bot or that task, and use that.
    • My bot's username is NovemBot
    • Using Special:BotPasswords while logged in as NovemBot will generate a username for me depending on what I name that bot password. So for example, if I type Task1, it will generate the username NovemBot@Task1 for me. It will also generate a password for me.
  • Benefits
    • NovemBot@Task1 only works when using the API. It will not work to log in through the normal website login form. So if your credentials leak, the person will need to be technical enough to use the API in order to take advantage of it.
    • If your credentials leak (e.g. you commit them to git accidentally), you can just turn NovemBot@Task1 off, instead of changing your main bot password. If you have 10 different bot tasks with 10 different bot usernames, being able to just turn off one will save you from having to update a ton of secrets/config files.
    • You can give each bot username limited permissions. So for example, if your bot is a template editor, you can have one bot username that is able to edit template protected pages, and another bot username that can't. This helps limit damage in the case of password compromise.
  • Examples
    • On my main account (an interface administrator), I have bot usernames for gadget deploy script #1, gadget deploy script #2, autowikibrowser, huggle, and my publish.php script that concatenates and publishes my user script files.
    • On my bot account (a template editor), I have a bot username for the template editor task, and a bot username for the non-template editor tasks.
  • By default, only the command line will work.
  • If you want web access (or "web service" in Toolforge parlance), you will need to specifically turn it on.
  • Uses of web access:
    • In addition to the automatic cron jobs, I like to manually summon my bots by visiting something like https://novem-bot.toolforge.org/task-a/index.php?password=, which I have stored as a bookmark
    • One of my bot tasks outputs detailed HTML to help me with debugging. HTML is of course best viewed in the browser
    • If you want to provide any kind of public website to users (as a tool, or as a tool for summoning your bot)
  • Make a folder in your project called public_html. Anything inside this folder will be available from the web.
  • In bash:
    • webservice start
    • As of March 2024, this creates a container with PHP 7.4 and lighttpd
  • Domain: https://novem-bot.toolforge.org/
  • If needed, don't forget to turn it off. Although I just leave mine running.
    • webservice stop
  • Other webservice commands
    • webservice status - tells you whether it's running or not, and what image it is using
    • webservice restart
    • webservice shell - switches to a shell inside the webservice
      • exit - switch back to a Toolforge shell
    • webservice logs
    • webservice $yourImage start - to specify an image other than php7.4
  • If you have anything sensitive on the server that you don't want random people to be able to run, make sure to password protect it or similar.
    • <?php if ( $_GET['password'] ?? "" != 'myPassword' ) die(); /* rest of code goes here */
  • Webservice images (such as php7.4) do not auto update without you going into bash, stopping the webservice, then starting the webservice with a newer image (such as php8.2).

Webservice images

[edit]
  • $yourImages that can be plugged in above
  • Get the latest list by typing toolforge webservice --help
  • Another list is located at wikitech:Help:Toolforge/Web#Latest supported pre-built images
  • The list as of March 2024
    • bookworm (Debian 12 base image)
    • jdk17 (Java)
    • node16
    • node18
    • perl5.32
    • perl5.36
    • php7.4 (default)
    • php8.2
    • python3.9
    • python3.11
    • ruby2.1
    • ruby2.7
    • ruby3.1
    • tcl8.6 (Tcl)

Scheduling cron jobs

[edit]
  • A cron job (or "job" in Toolforge parlance) is setting up a server to execute a file at a regular interval.
  • This is exactly what we need for most kinds of bots. Most bots will run a job, finish the job, exit, then need something to start them up again at the appropriate time.
  • There are 3 ways to do cron jobs on Toolforge:
    • Kubernetes - verbose, involves creating .yaml config files, and the status reports have a bunch of pods for each job, avoid (if you must, here are my old notes on it)
    • Jobs framework - use this method, lets you schedule a cron job with a couple shell commands, and cleanly view status reports

Jobs framework

[edit]
  • create a task-a.sh file (.sh files just run shell commands, .sh stands for shell) with contents:
    • php /data/project/novem-bot/public_html/novembot-task-a.php 'CLI arguments (such as password) go here, if needed by your program'
  • upload the file to your root directory using FTP (/data/project/novem-bot)
  • set file's permissions to 0700. need 7 for Jobs framework. need 00 to keep the file private.
  • Run this to add your job, replacing the variables:
    • toolforge jobs run $yourJobName --command ./$yourFileName.sh --image $yourImage --schedule "$yourInterval" --emails $yourEmailPreference
  • A more fleshed out example:
    • toolforge jobs run task-a --command ./task-a.sh --image php8.2 --schedule "@daily" --emails onfailure

Jobs framework cron job intervals

[edit]
  • If you don't mind some inconsistency (the time of script run changing every time), use one of these macros. This will run your script when server load is lowest:
    • @hourly
    • @daily
    • @weekly
    • @monthly
    • @yearly
  • If you want to specify precisely when your job will run, you can specify it in cron syntax.
    • Calculator tool: https://crontab.guru/

Jobs framework email preferences

[edit]
  • none, don't get any email notification. The default behavior.
  • onfailure, receive email notifications in case of a failure event.
  • onfinish, receive email notifications in case of the job finishing (both successfully and on failure).
  • all, receive all possible notifications.

Jobs framework images

[edit]
  • $yourImages that can be plugged in above
  • Confusingly, the Jobs framework uses different images than the webservice
  • Do toolforge jobs images to see the latest images
  • The list as of March 2024
    • bookworm (Debian 12)
    • bullseye (Debian 11)
    • jdk17 (Java)
    • mariadb (SQL)
    • mono68 (.NET Framework)
    • node16
    • node18
    • perl5.32
    • perl5.36
    • php7.4
    • php8.2
    • python3.9
    • python3.11
    • ruby2.1
    • ruby2.7
    • ruby3.1
    • tcl8.6 (Tcl)
  • You can build custom images via the wikitech:Help:Toolforge/Build Service

More Jobs framework commands

[edit]
  • Add a cron job
    • toolforge jobs run $yourJobName --command ./$yourFileName.sh --image $yourImage --schedule "$yourInterval" --emails $yourEmailPreference
    • toolforge jobs run task-a --command ./task-a.sh --image php8.2 --schedule "@daily" --emails onfailure
  • Restart a job (remove and re-add it without having to remember the details, may trigger an instant re-run)
    • toolforge jobs restart $yourJobName
  • Delete a cron job
    • toolforge jobs delete $yourJobName
  • List all cron jobs
    • toolforge jobs list
  • Get details about a specific job
    • toolforge jobs show $yourJobName
  • See your account's quotas
    • toolforge jobs quota

Forking someone else's bot/tool

[edit]
  • If someone goes inactive and their Toolforge bot goes down, you can look into forking their bot.
  • Figure out what their Toolforge bot name is by checking the ToolsAdmin list.
  • Often the ToolsAdmin list will contain a link to the source code, if you have their source code, you can just setup a new bot on a new tool account and ask for the old one to be deactivated (How??).
  • If their source code is out of date or inaccessible, you will need to try and access the code stored on the Toolforge machines.
  • Two ways to do this:
    • Easier #1: Log into WinSCP FTP client, navigate to /data/project/their-bot-name, see if most of their files and folders are publicly readable (they are by default). Then copy these into your own bot/repo.
      • If you click on a folder and it says "Server returned empty listing for directory", that means access was denied.
    • Easier #2: Log into PuTTy, become novem-bot, cd /data/project/their-bot-name, ls -ltR
      • If the message "ls: cannot access X: Permission denied" appears, that means access was denied.
    • Harder: Maybe their files are not readable with the above technique. Then you'll want to look into wikitech:Help:Toolforge/Abandoned tool policy.

Debugging tips

[edit]
  • check your error logs
    • taskname.err
    • taskname.out
    • error.log
  • add ini_set( "display_errors", 1 ); and error_reporting( E_ALL ); to line 1 of your code
  • run the code locally and make sure it's not throwing PHP errors
  • toolforge jobs restart $jobName
  • toolforge jobs delete $jobName; toolforge jobs run $jobName
  • webservice stop; webservice start
  • to debug "exceeded max_user_connections": sql enwiki; show processlist;

Getting help

[edit]
  • WP:DISCORD's #technical channel is a great resource.
  • Folks that speak Linux and Toolforge and have helped me out in the past include AntiCompositeNumber, Taavi, Chlod, and SD0001.
  • WP:IRC #wikimedia-cloud is the official support channel for ToolForge.
  • WP:VPT can probably help if you prefer onwiki help.