This chapter describes how the information accessible on www.gap-system.org is stored and collected, and how it is transformed into web pages.
The GAP website (in the following just called "website") has a tree structure for easier navigation and overview. Each node and each leaf of the tree is a web page. Every single page resides somewhere in this tree. This position is shown in the navigation bar on the left hand side, and the user can navigate through the tree using this navigation bar. However, pages can still link to other pages that reside in some other branch of the tree.
With very few exceptions, all pages are static HTML pages conforming to the XHTML 1.0 standard (see Section 9.4). However, these pages are not edited directly by the maintainer, but they are produced by a tool called "the Mixer" (see 9.3), which takes so called ".mixer
-files" as source and produces the final HTML files. During this process, the navigation bar and some other parts of the page are created automatically, such that the maintainer does not have to worry about technicalities. A .mixer
file essentially contains the content of the page in form of a well-formed XML document (see again Section 9.4 for an explanation) and the Mixer handles the technical details.
All the sources for the web pages are kept in the git repository https://github.com/gap-system/GapWWW
. So you can clone this repository using
git clone https://github.com/gap-system/GapWWW
The web server in St Andrews also uses its clone, updates it to the latest revision of the master
branch, runs the Mixer and then serves the pages. Another named branch is called testing
and it is served on the password protected version of the GAP website at http://devel.gap-system.org/testsite where work in progress may be published to be reviewed internally.
The GAP website has some pages that are treated specially such as the GAP manuals, the pages for the GAP packages, the pages providing search facilities, the pages for the GAP bibliography, the sitemap, and the (old) GAP forum archive. The setup for these special pages is described in Sections 9.8 to 9.13 in this chapter.
In the following sections we first cover the Mixer, the web standard XHTML 1.0, the usage of git for the web pages, and the installation of the web site on the web server.
There are several possible workflows dependently on how much efforts you would like to commit to the website maintenance.
A minimalistic scenario for small improvements (e.g. correcting details and fixing typos) only requires to install git and then:
Clone the Website repository: git clone https://github.com/gap-system/GapWWW
Make changes in the master
branch
Commit and push changes to trigger notification to website admininstrator(s) to check and approve this update.
A more robust scenario, especially for changes that are more likely to break the Mixer syntax, is to clone also the Mixer repository with
git clone https://github.com/gap-system/Mixer
an build the Mixer as described in the mixer.README
file (see Section 9.3 for further details). For this step, you will need a C compiler (for compiling parts of the Mixer) and a Python interpreter (for running the Mixer).
With Mixer, you may run the mixer.py
script (probably with -f
option to rebuild everything regardless the timestamps) inside the GapWWW
working directory to check how produced html pages look like in your browser before committing and pushing the changes.
Finally, while changes in the master
branches will trigger notification to website admininstrator(s) to check and approve them, for changes that you want to be internally reviewed prior to publication, you may use the testing
branch which is served on the password protected version of the GAP website at http://devel.gap-system.org/testsite. Changes in the testing
branch will appear at the testing site immediately after pushing them to the master repository. This workflow is useful if you want to show your suggestions to a wider group of people who may not have an opportunity to install Mixer and have a local version of the GAP website to review your changes.
If you are one of website admininstrator(s), then you will also need to be able to access the web server in St Andrews via ssh
to run certain update scripts and copy necessary data.
The Mixer is a Python script that uses a C-library to parse XML documents (see Section 9.4). Therefore this library (which comes with the Mixer) has to be compiled first.
The Mixer is kept in the git repository https://github.com/gap-system/Mixer
. To clone this repository, use
git clone https://github.com/gap-system/Mixer
The above command creates a clone of this repository in the directory Mixer
of the current directory. In that directory you can create the manual of the Mixer by calling make mixer.pdf
provided you have an installation of LaTeX on your machine. In that manual the Mixer and its installation are described in details.
Alternatively you can download a copy of the Mixer and its documentation from this page.
A small comment on the rationale behind the Mixer might be in order. The fact that the input of the Mixer, that is the .mixer
-files have to be well-formed XML documents (see Section 9.4) might at first sight be considered inconvenient and a bit awkward. However, this fact greatly improves the chances that the resulting HTML files conform to the XHTML 1.0 standard and at the same time lead to the fact that the Mixer is able to give very concise and usable error message during parsing in case something is not well-formed. This together with the automatically generated navigation bar makes the Mixer a valuable tool for the creation of web pages.
Note in particular that the tree structure of the whole web site is controlled by the tree
files in each subdirectory, exactly as described in the manual of the Mixer.
The HTML language has undergone a series of revision and standardizations. One major step was to make an HTML standard that conforms to the XML standard which happened with the revision "XHTML 1.0" of the HTML standard. This step was important because the XML framework makes it much easier to parse such documents automatically and check for "well-formedness". Here, the term "well-formed" means that the document fulfils a set of syntactic rules. That is, a document might be well-formed and at the same time not make any sense. See this page for details. A short introduction to the XML standard can be found in Section GAPDoc: XML in the GAPDoc manual.
The GAP web pages should conform to the standard XHTML 1.0. To cut a long story short, this means a few restrictions on the markup to use. Here we quickly cover the most important things, which should enable anybody who has ever seen an HTML document of any version to get started.
All tags must be written with lower case letters in the element names.
All non-empty elements must have a start- and end-tag, in particular enclose paragraphs in <p>
and </p>
or list entries in <li>
and </li>
.
Elements must be properly nested like brackets, that is things like <a><b></a></b>
are not allowed.
Attributes always must have an assigned value and the value must be enclosed by either double or single quotes; for example <a href="https://www.gap-system.org">GAP site</a>
.
Write empty elements like <br />
, the space before the /
is not necessary according to the specification but it helps some old browsers to interpret it correctly.
Do not put information on colors or fonts in the XHTML file. Instead use the .css
style sheet file. (For complicated cases use the class
attribute to mark elements for which you want to give special formating rules in the style sheet.
The XML markup characters "<
", "&
", and ">
" must be entered as "<
", "&
", and ">
" respectively. There are quite a few such "entities" which are defined to enter special characters. See this page for details.
Using the W3C specification HTML 4.01 - this includes a nice elements overview - together with the above rules and the general rule to avoid complicated looking constructs when possible, we found it not too difficult to produce sets of valid web pages.
We assume here that you are familiar with the standard git commands git clone
, git pull
, git push
, git update
, git commit
etc.
The source files for the web site are kept in the git repository https://github.com/gap-system/GapWWW
. You may clone it by doing
git clone https://github.com/gap-system/GapWWW
This command creates in your current directory a directory GapWWW
with the complete source tree of the web site.
Source files are treated like any other source file in the git repository, that is you can update, modify, commit, add, remove them as usual.
The only thing one has to understand with respect to git is which implications the branch in which the change has appeared will have on the process of its publication:
Changes in the master
branch will not be automatically published on the web server. They will be reviewed by the website administrator who will then have to run the update script on the server in St Andrews as described in Section 9.6 to make them available online.
Changes in the testing
branch will immediately appear on the password protected version of the GAP website at http://devel.gap-system.org/testsite where they may be reviewed by others.
Changes in feature branches (which you may create to keep some work in progress) will not be visible anywhere.
A little comment on the rationale behind this setup might be in order. It allows that more than one person works independently on the website and those people exchange versions via git, without publishing them immediately. The actual guidelines who does what in this process should be agreed on separately.
Currently, the actually published version of the web site is contained in the directory /gap/GapWWW
on the following machine in St Andrews:
yin.mcs.st-andrews.ac.uk
This machine is not really a web server, but the real web server mounts and serves the directory /gap/GapWWW
from yin
via NFS.
The Mixer is checked out (still old CVS version, it has not been changed since it remained unchanged over several years) and installed in the directory /gap/Mixer
. It can be called with the command
/gap/Mixer/mixer.py
The files are checked out with ownership gapchron
which is a user on yin
with the same numerical user ID than the gap
user. In other words, one has to be the user gap
to manipulate the data. Note that the home directory of the user gap
is in fact /gap
.
To get access to this data the easiest and most secure way is probably to create an RSA key pair, append the public key to /gap/.ssh/authorized_keys
and to keep the private key in the .ssh
subdirectory of the user's home directory.
There is one shell script which is run by a website administrator to update the website. This script is in bin/updateGapWWW.sh
. It basically pulls the latest version from the master repository and runs the Mixer. You can trigger the update manually by doing
ssh gap@yin.mcs.st-andrews.ac.uk bin/updateGapWWW.sh
once you have ssh
access to yin
.
Before performing an update on yin
, it is wise to check first whether the Mixer runs without an error message in your own checked out version of the website.
This section describes the procedure to install the GAP web site on a machine from scratch. Thus, this section is usually not needed because all this is already done on the machine yin.mcs.st-andrews.ac.uk
. However, if one wants to have an exact copy of the web site or have to install it somewhere anew, this section is needed. This section was derived from the ASCII document GapWWW/INSTALL
long time ago when it was under CVS control (so GapWWW/INSTALL
is likely heavily outdated).
standard tools: git
, tar
, gzip
, make
, sh
a C-compiler, preferably gcc
Python version 2.2 or later (for running the Mixer)
yEd graph editor if you want to edit the sitemap
a web server if pages shall be published
a copy of the full doc
directory from a GAP installation for references into the manual (this can reside on some web site)
facility to run CGI-scripts for feedback pages (TODO: check if we still use them)
setup for automatic creation of the pages for packages
the swish
utility for the creation of the search indices (TODO: may be better to switch to Google search)
Clone the git repository GapWWW:
git clone https://github.com/gap-system/GapWWW
This creates a subdirectory GapWWW
in the current directory.
Clone the git repository Mixer:
git clone https://github.com/gap-system/Mixer
This creates a subdirectory Mixer
in the current directory.
Unpack some (frozen) subtrees, which are in archives:
cd GapWWW gzip -dc ForumArchive.tar.gz | tar xvf - cd Gap3 gzip -dc Manual3.tar.gz | tar xvf - cd ..
Edit GapWWW/lib/config
, see that file for instructions:
vi lib/config
In this file a few variables have to be defined to adapt the web pages to the local conditions.
Copy a whole doc
directory of a GAP distribution to the place mentioned in GapWWW/lib/config
(see step 4.) in the variable GAPManualLink
(this is GapWWW/Manuals
in the current setup).
The files for the GAP bibliography have been included into this directory tree in the repository.
Create the html and PDF versions by:
cd Doc/Bib gap4 convbib.g cd ../..
Some more information about this is in GapWWW/Doc/Bib/INFO
which is unchanges since 2010 and may be somewhat outdated.
Install search facility:
Things are in GapWWW/Search
. You need the swish
utility installed to create the index files for searching. Create a link in the Search
directory to the swish executable. Then create index files by:
cd Search ln -s PATHTOSWISH swish make cd ..
(PATHTOSWISH
has to be replaced by the path to the swish executable.)
The CGI script GapWWW/Search/search.cgi
will take care of the rest.
Install package manuals:
Copy the result of Frank's scripts to the place mentioned in GapWWW/lib/config
(in the variable pkgmixerpath
). (currently, this is GapWWW/Manuals
, copy the whole pkg
directory)
To update the package pages, copy all .mixer
files and pkgconf.py
to GapWWW/Packages
and rerun the Mixer.
Make sure that the file GapWWW/lib/AllLinksOfAllHelpSections.data
is always up-to-date (this has to be adjusted whenever the released manuals change).
In the development version of GAP there is a file dev/LinksOfAllHelpSections.g
. Read this with a current GAP version with all currently released packages installed and call WriteAllLinksOfAllHelpSections()
, this writes the file AllLinksOfAllHelpSections.data
. It has then to be checked in to its place under the GapWWW
tree. Do not forget to publish the latest revision.
Run the mixer:
../Mixer/mixer.py -f
(the -f
forces creation regardless of timestamps)
If things are changed in the repository, all that has to be done to update the pages locally is:
git pull
in the GapWWW
directory, followed by a
../Mixer/mixer.py
The mixer has an option -f
to force recreation of all pages. This is necessary if some general files like the address database lib/addresses
or templates changes.
To change the sitemap, use yEd graph editor to modify sitemap.graphml
file, then used yEd export menu to create sitemap.html
file with associated .png
image.
All GAP manuals are available in HTML format via the web pages. This works by simply copying the doc
directory of a complete GAP installation to the place specified by the variable GAPManualLink
in GapWWW/lib/config
(which is GapWWW/Manuals
in the current setup). Note that those files are not under version control there, they are only copied to checked out working copies, like for example on the web server in St Andrews.
The single remaining point to explain is how one can specify links to manual sections on the web pages. This is done with a special Mixer tag like the following:
<mixer manual="Reference: Lists">Chapter about lists</mixer>
This element creates a link to the manual section which would appear in the GAP help system when called with "?Reference: Lists
", which happens to be the chapter in the reference manual about lists. The text of the link would be "Chapter about lists".
This works, because the Mixer has access to a file containing the links to all manual sections. This file resides in GapWWW/lib/AllLinksOfAllHelpSections.data
, which is created using dev/LinksOfAllHelpSections.g
in the development version of GAP as described in Section 9.7.
The value of the attribute "manual" in the "mixer" tag must be the complete text of the section heading the link should point to.
The archives and web pages for the GAP packages are generated by yet another set of tools described in Chapter 8. These generate for every package a .mixer
-file and for all packages together a file pkgconf.py
. All these files have to be put under version control in the directory GapWWW/Packages
. These nodes then only have to be put into the tree by mentioning them in the tree
file there.
The search engine on the web pages internally uses the swish
tool. It is used to create an index of all pages which allows very fast searches when a user submits a query. All files for this setup are in the directory GapWWW/Search
.
The indices are regenerated by doing
touch everything.conf make
in that directory. This is done automatically every night, such that usually nothing has to be done after installation.
To make this work, one needs a swish
executable and has to create a link GapWWW/Search/swish
to that executable.
The GAP bibliography resides in the directory GapWWW/Doc/Bib
.
The source files are:
GapCite.MR
This file contains just MR numbers of papers that cite or refer to (one of the versions of) GAP (here and below "MR" stands for "Mathematical Reviews".). The format is alternatingly one line of the form 1stAuthorSurname Paper
(not starting with a blank) and one line MR-Number
(starting with a blank). MR numbers will be used to get full bibliographic info from MathSciNet, and the textual description only helps when adding papers to the file (in particular, to keep entries sorted by the first author).
GapCite.notyet
BiBTeX
entries for papers that are not yet in MR but likely will be there in a few months
GapNonMR.bib
BiBTeX
entries for papers that will not be in MR (e.g. theses)
NonVerif.MR
Things not yet verified, same format as GapCite.MR
NonVerif.NonMR
Things not yet verified, same format as GapCite.notyet
GapIgnore.MR
This file contains a list of GAP strings corresponding to MR numbers of papers that may be falsely reported by MathSciNet as citing GAP (for example, if they refer to the History of Mathematics Archive website wrongly stating its address in the GAP domain as may be returned by some search systems). If necessary, add new items there in an obvious way.
It is possible to check MathSciNet for new references to GAP reading the file updatebib.g
into GAP. It will produce two files:
tobeadded.txt
This file has the same format as GapCite.MR
and lists publications citing GAP which should be examined and after that added either to GapCite.MR
or to GapIgnore.MR
.
suggested.txt
This file contains suggestions to "move" certain entries from GapCite.notyet
and GapNonMR.bib
to GapCite.MR
. All suggestions, including those which do not match the publication listed in the GAP bibliography, should be carefully examined before any changes.
Note that updatebib.g
is not a complete solution for updating GAP bibliography. It searches for occurrences of the substring www.gap
in citations (this covers both old and current addresses of the GAP website), but it does not cover publications citing GAP without its website or referring to it only in the text; finally, it covers only MathSciNet and does not look into other bibliography databases. Therefore, manual search still should be used to discover more GAP citations. The function SearchMathSciNetForUpdates
from updatebib.g
may be helpful in this direction since it performs more broad search in the MathSciNet, dropping some more strict limitations.
After the source files of the GAP bibliography are updated, the script newmakegapbib
uses GapCite.MR
, GapCite.notyet
and GapNonMR.bib
(and also HEADER
and MRBIB
) to produce gap-published.bib
(this requires subscription to MathSciNet, which St Andrews has). The advantage of this approach is that MathSciNet gives us good BiBTeX
entries (no need to look up journal names or diacritic characters) and their updates, and MR numbers we can link to. It also makes it easier to add entries as only the MR number is needed.
At the end of its work newmakegapbib
will also display error messages reporting MR numbers whose BibTeX record it failed to fetch from MathSciNet -- these should be investigated since they may point out on some inconsistencies in our data.
(There is also a script GETMR
that will return MR numbers for papers -- convenient to look up a large number of papers one found in the citation index.)
Finally GAP itself called with convbib.g
produces the web page and a nice PDF bibliography from gap-published.bib
(using further helper files gapbib.tex
and gap-head.bib
). The resulting files are gap-published.html
and gap-published.pdf
which are linked from the main web page bib.html
. NOTE that gap-published.html
and gap-published.pdf
are not under version control because they can be generated automatically by convbib.g
rather quickly. In addition, convbib.g
creates statistics.generated
and statistics.mscreport
- two pages with tables which are used in statistics.mixer
to create statistics.html
.
The output of convbib.g
should be also checked for errors and warnings reporting repeated entries, incomplete BibTeX records (mostly may be ignored), etc.
NOTE: The current setup does not run GAP on convbib.g
every night. This means that everybody who changes the GAP bibliography has to do this manually on yin
after every change.
The sitemap picture is generated and edited in the following way: The original source is the file sitemap.graphml
which is generated and edited with the yEd program. yEd functionality allows to export the sitemap as a clickable HTML image map, producing two files sitemap.html
and sitemap1_1.png
. Because the sitemap usually does not change very much, these two files are also kept under the version control.
Until December 2003 the GAP forum archive was handled by a tool written especially for this task. At that point it was switched to mailman
, a generic tool for mailing list, which also does the archiving. Therefore the old forum archives are frozen in form of a huge amount of HTML pages. These are not kept under version control as single files but as one big binary archive under GapWWW/ForumArchive.tar.gz
.
To install those pages in a checked out working copy one just has to extract this archive by doing
gzip -dc ForumArchive.tar.gz | tar xf -
in the GapWWW
directory as explained in Section 9.7.
generated by GAPDoc2HTML