This is a guide to extending R, describing the process of creating R add-on packages, writing R documentation, R's system and foreign language interfaces, and the R API.
The current version of this document is 2.4.1 (2006-12-18).
ISBN 3-900051-11-9
The contributions of Saikat DebRoy (who wrote the first draft of a guide
to using .Call and .External) and of Adrian Trapletti (who
provided information on the C++ interface) are gratefully acknowledged.
Packages provide a mechanism for loading optional code and attached documentation as needed. The R distribution provides several packages.
In the following, we assume that you know the `library()' command,
including its `lib.loc' argument, and we also assume basic
knowledge of the INSTALL utility. Otherwise, please look at R's
help pages
?library
?INSTALL
before reading on.
A computing environment including a number of tools is assumed; the “R Installation and Administration” manual describes what is needed. Under a Unix-alike most of the tools are likely to be present by default, but Microsoft Windows and MacOS X will require careful setup.
Once a source package is created, it must be installed by
the command R CMD INSTALL.
See Add-on-packages, for further details.
Other types of extensions are supported: See Package types.
A package consists of a subdirectory containing a file DESCRIPTION and the subdirectories R, data, demo, exec, inst, man, po, src, and tests (some of which can be missing). The package subdirectory may also contain files INDEX, NAMESPACE, configure, cleanup, and COPYING. Other files such as README, NEWS or ChangeLog will be ignored by R, but may be useful to end-users.
The DESCRIPTION and INDEX files are described in the sections below. The NAMESPACE file is described in Package name spaces.
The optional files configure and cleanup are (Bourne shell) script files which are executed before and (provided that option --clean was given) after installation on Unix-alikes, see Configure and cleanup.
The optional file COPYING contains a copy of the license to the package, e.g. a copy of the GNU public license. Whereas you should feel free to include a licence file in your source distribution, please do not arrange to install yet another copy of the GNU COPYING or COPYING.LIB files but refer to the copies in the R distribution (e.g., in directory share/licenses in your own COPYING file).
The package subdirectory should be given the same name as the package. Because some file systems (e.g., those on Windows) are not case-sensitive, to maintain portability it is strongly recommended that case distinctions not be used to distinguish different packages. For example, if you have a package named foo, do not also create a package named Foo.
To ensure that file names are valid across file systems and supported
operating system platforms, the ASCII control characters as
well as the characters `"', `*', `:', `/', `<',
`>', `?', `\', and `|' are not allowed in file
names. In addition, files with names `con', `prn',
`aux', `clock$', `nul', `com1' to `com4', and
`lpt1' to `lpt3' after conversion to lower case and stripping
possible “extensions”, are disallowed. Also, file names in the same
directory must not differ only by case (see the previous paragraph).
In addition, the names of `.Rd' files will be used in URLs and so
must be ASCII and not contain %.
The R function package.skeleton can help to create the
structure for a new package: see its help page for details.
The DESCRIPTION file contains basic information about the package in the following format:
Package: pkgname Version: 0.5-1 Date: 2004-01-01 Title: My First Collection of Functions Author: Joe Developer <Joe.Developer@some.domain.net>, with contributions from A. User <A.User@whereever.net>. Maintainer: Joe Developer <Joe.Developer@some.domain.net> Depends: R (>= 1.8.0), nlme Suggests: MASS Description: A short (one paragraph) description of what the package does and why it may be useful. License: GPL version 2 or newer URL: http://www.r-project.org, http://www.another.url
Continuation lines (for example, for descriptions longer than one line) start with a space or tab. The `Package', `Version', `License', `Description', `Title', `Author', and `Maintainer' fields are mandatory, the remaining fields (`Date', `Depends', `URL', ...) are optional.
The DESCRIPTION file should be written entirely in ASCII for maximal portability.
The `Package' and `Version' fields give the name and the version of the package, respectively. The name should consist of letters, numbers, and the dot character and start with a letter. The version is a sequence of at least two (and usually three) non-negative integers separated by single `.' or `-' characters. The canonical form is as shown in the example, and a version such as `0.01' or `0.01.0' will be handled as if it were `0.1-0'. (Translation packages are allowed names of the form `Translation-ll'.)
The `License' field should contain an explicit statement or a well-known abbreviation (such as `GPL', `LGPL', `BSD', or `Artistic'), perhaps followed by a reference to the actual license file. It is very important that you include this information! Otherwise, it may not even be legally correct for others to distribute copies of the package.
The `Description' field should give a comprehensive description of what the package does. One can use several (complete) sentences, but only one paragraph.
The `Title' field should give a short description of the package. Some package listings may truncate the title to 65 characters in order to keep the overall size of the listing limited. It should be capitalized, not use any markup, not have any continuation lines, and not end in a period. Older versions of R used a separate file TITLE for giving this information; this is now defunct, and the `Title' field in DESCRIPTION is required.
The `Author' field describes who wrote the package. It is a plain text field intended for human readers, but not for automatic processing (such as extracting the email addresses of all listed contributors).
The `Maintainer' field should give a single name with a valid email address in angle brackets (for sending bug reports etc.). It should not end in a period or comma.
The optional `Date' field gives the release date of the current version of the package. It is strongly recommended to use the yyyy-mm-dd format conforming to the ISO standard.
The optional `Depends' field gives a comma-separated list of
package names which this package depends on. The package name may be
optionally followed by a comparison operator (currently only `>='
and `<=' are supported), whitespace and a valid version number in
parentheses. (List package names even if they are part of a bundle.)
You can also use the special package name `R' if your package
depends on a certain version of R. E.g., if the package works only with
R version 2.4.0 or newer, include `R (>= 2.4.0)' in the
`Depends' field. Both library and the R package checking
facilities use this field, hence it is an error to use improper syntax
or misuse the `Depends' field for comments on other software that
might be needed. Other dependencies (external to the R system)
should be listed in the `SystemRequirements' field or a separate
README file. The R INSTALL facilities check if the
version of R used is recent enough for the package being installed,
and the list of packages which is specified will be attached (after
checking version dependencies) before the current package, both when
library is called and when saving an image of the package's code
or preparing for lazy-loading.
The optional `Imports' field lists packages whose name spaces are imported from but which do not need to be attached. Name spaces accessed by the `::' and `:::' operators must be listed here, or in `Suggests' or `Enhances' (see below). Ideally this field will include all the standard packages, and it is important to include S4-using packages (as their class definitions can change and the DESCRIPTION file is used to decide which packages to re-install when this happens).
The optional `Suggests' field uses the same syntax as `Depends' and lists packages that are not necessarily needed. This includes packages used only in examples or vignettes (see Writing package vignettes), and packages loaded in the body of functions. E.g., suppose an example from package foo uses a dataset from package bar. Then it is not necessary to have bar for routine use of foo, unless one wants to execute the examples: it is nice to have bar, but not necessary.
Finally, the optional `Enhances' field lists packages “enhanced” by the package at hand, e.g., by providing methods for classes from these packages.
The general rules are
library(pkgname) must be listed in the `Imports'
field.
library(pkgname) must be listed in the `Depends'
field.
R CMD check on
the package must be listed in one of `Depends' or `Suggests'
or `Imports'.
In particular, large packages providing “only” data for examples or vignettes should be listed in `Suggests' rather than `Depends' in order to make lean installations possible.
The optional `URL' field may give a list of URLs separated by commas or whitespace, for example the homepage of the author or a page where additional material describing the software can be found. These URLs are converted to active hyperlinks on CRAN.
Base and recommended packages (i.e., packages contained in the R source distribution or available from CRAN and recommended to be included in every binary distribution of R) have a `Priority' field with value `base' or `recommended', respectively. These priorities must not be used by “other” packages.
An optional `Collate' field (or OS-specific variants `Collate.OStype', such as e.g. `Collate.windows') can be used for controlling the collation order for the R code files in a package when these are concatenated into a single file upon installation from source. The default is to try collating according to the `C' locale. If present, the collate specification must list all R code files in the package (taking possible OS-specific subdirectories into account, see Package subdirectories) as a whitespace separated list of file paths relative to the R subdirectory. Paths containing white space or quotes need to be quoted. Applicable OS-specific collate specifications take precedence.
The optional `LazyLoad' and `LazyData' fields control whether the R objects and the datasets (respectively) use lazy-loading: set the field's value to `yes' or `true' for lazy-loading and `no' or `false' for no lazy-loading. (Capitalized values are also accepted.)
The optional `SaveImage' field controls whether the R objects are stored in a saved image. (It takes the same values as the field `LazyLoad'.)
If the package you are writing uses the methods package, specify (preferably) `LazyLoad: yes' or `SaveImage: yes'.
The optional `ZipData' field controls whether the automatic Windows build will zip up the data directory or no: set this to `no' if your package will not work with a zipped data directory.
If the DESCRIPTION file is not entirely in ASCII it
should contain an `Encoding' field specifying an encoding. This is
currently used as the encoding of the DESCRIPTION file itself,
and may in the future be taken as the encoding for other documentation
in the package. Only encoding names latin1, latin2 and
UTF-8 are known to be portable.
The optional `Type' field specifies the type of the package: see Package types.
Note: There should be no `Built' or `Packaged' fields, as these are added by the package management tools.
The optional file INDEX contains a line for each sufficiently
interesting object in the package, giving its name and a description
(functions such as print methods not usually called explicitly might not
be included). Normally this file is missing, and the corresponding
information is automatically generated from the documentation sources
(using Rdindex() from package tools) when installing from
source and when using the package builder (see Checking and building packages).
Rather than editing this file, it is preferable to put customized information about the package into an overview man page (see Documenting packages) and/or a vignette (see Writing package vignettes).
The R subdirectory contains R code files, only. The code
files to be installed must start with a (lower or upper case) letter or
digit and have one of the extensions .R, .S, .q,
.r, or .s. We recommend using .R, as this
extension seems to be not used by any other software. It should be
possible to read in the files using source(), so R objects
must be created by assignments. Note that there need be no connection
between the name of the file and the R objects created by it. The R
code files should only create R objects and not call functions with
side effects such as require and options.
Two exceptions are allowed: if the R subdirectory contains a file
sysdata.rda (a saved image of R objects) this will be
lazy-loaded into the name space/package environment – this is intended
for system datasets that are not intended to be user-accessible via
data. Also, files ending in `.in' will be allowed in the
R directory to allow a configure script to generate
suitable files,
Only ASCII characters (and the control characters tab,
formfeed, LF and CR) should be used in code files. Other characters are
accepted in comments, but then the comments may not be readable in
e.g. a UTF-8 locale. Non-ASCII characters in object names
will normally1 fail when the package is installed. Any byte will be
allowed2 in a quoted character string (but \uxxxx
escapes should not be used), but non-ASCII character strings
may not be usable in some locales and may display incorrectly in others.
Various R functions in a package can be used to initialize and clean
up. For packages without a name space, these are .First.lib and
.Last.lib. (See Load hooks, for packages with a name space.)
It is conventional to define these functions in a file called
zzz.R. If .First.lib is defined in a package, it is
called with arguments libname and pkgname after the
package is loaded and attached. (If a package is installed with version
information, the package name includes the version information, e.g.
`ash_1.0.9'.) A common use is to call library.dynam
inside .First.lib to load compiled code: another use is to call
those functions with side effects. If .Last.lib exists in a
package it is called (with argument the full path to the installed
package) just before the package is detached. It is uncommon to detach
packages and rare to have a .Last.lib function: one use is to
call library.dynam.unload to unload compiled code.
The man subdirectory should contain (only) documentation files for the objects in the package in R documentation (Rd) format. The documentation files to be installed must start with a (lower or upper case ASCII) letter or digit and have the extension .Rd (the default) or .rd. Further, the names must be valid in `file://' URLs, which means3 they must be entirely ASCII and not contain `%'. See Writing R documentation files, for more information. Note that all user-level objects in a package should be documented; if a package pkg contains user-level objects which are for “internal” use only, it should provide a file pkg-internal.Rd which documents all such objects, and clearly states that these are not meant to be called by the user. See e.g. the sources for package grid in the R distribution for an example. Note that packages which use internal objects extensively should hide those objects in a name space, when they do not need to be documented (see Package name spaces).
The R and man subdirectories may contain OS-specific subdirectories named unix or windows.
The C, C++, or FORTRAN4 source files for the
compiled code are in src, plus optionally file Makevars or
Makefile. When a package is installed using R CMD
INSTALL, Make is used to control compilation and linking into a shared
object for loading into R. There are default variables and rules for
this (determined when R is configured and recorded in
R_HOME/etc/Makeconf). These rules can be tweaked by
setting macros in a file src/Makevars (see Using Makevars).
Note that this mechanism should be general enough to eliminate the need
for a package-specific Makefile. If such a file is to be
distributed, considerable care is needed to make it general enough to
work on all R platforms. In addition, it should have a target
`clean' which removes all files generated by Make. If necessary,
platform-specific files can be used, for example Makevars.win or
Makefile.win on Windows take precedence over Makevars or
Makefile.
The data subdirectory is for additional data files the package
makes available for loading using data(). Currently, data files
can have one of three types as indicated by their extension: plain R
code (.R or .r), tables (.tab, .txt, or
.csv), or save() images (.RData or .rda).
(All ports of R use the same binary (XDR) format and can read
compressed images. Use images saved with save(, compress =
TRUE), the default, to save space.) Note that R code should be
“self-sufficient” and not make use of extra functionality provided by
the package, so that the data file can also be used without having to
load the package. It is no longer necessary to provide a 00Index
file in the data directory—the corresponding information is
generated automatically from the documentation sources when installing
from source, or when using the package builder (see Checking and building packages). If your data files are enormous you can speed up
installation by providing a file datalist in the data
subdirectory. This should have one line per topic that data()
will find, in the format `foo' if data(foo) provides
`foo', or `foo: bar bah' if data(foo) provides
`bar' and `bah'.
The demo subdirectory is for R scripts (for running via
demo()) that demonstrate some of the functionality of the
package. Demos may be interactive and are not checked automatically, so
if testing is desired use code in the tests directory. The
script files must start with a (lower or upper case) letter and have one
of the extensions .R or .r. If present, the demo
subdirectory should also have a 00Index file with one line for
each demo, giving its name and a description separated by white
space. (Note that it is not possible to generate this index file
automatically.)
The contents of the inst subdirectory will be copied recursively
to the installation directory. Subdirectories of inst should not
interfere with those used by R (currently, R, data,
demo, exec, libs, man, help,
html, latex, R-ex, chtml, and Meta).
The copying of the inst happens after src is built so its
Makefile can create files to be installed. Note that with the
exceptions of INDEX and COPYING, information files at the
top level of the package will not be installed and so not be
known to users of Windows and MacOS X compiled packages (and not seen by
those who use R CMD INSTALL or install.packages on
the tarball). So any information files you wish an end user to see
should be included in inst. One thing you might like to add to
inst is a CITATION file for use by the citation
function.
Subdirectory tests is for additional package-specific test code,
similar to the specific tests that come with the R distribution.
Test code can either be provided directly in a .R file, or via a
.Rin file containing code which in turn creates the corresponding
.R file (e.g., by collecting all function objects in the package
and then calling them with the strangest arguments). The results of
running a .R file are written to a .Rout file. If there
is a corresponding .Rout.save file, these two are compared, with
differences being reported but not causing an error. The whose
tests is copied to the check area, and the tests are run with the
copy as the working directory and with R_LIBS set to ensure that
the copy of the package installed during testing will be found by
library(pkg_name).
Subdirectory exec could contain additional executables the package needs, typically scripts for interpreters such as the shell, Perl, or Tcl. This mechanism is currently used only by a very few packages, and still experimental.
Subdirectory po is used for files related to localization: see Localization.
Sometimes it is convenient to distribute several packages as a bundle. (An example is VR which contains four packages.) The installation procedures on both Unix-alikes and Windows can handle package bundles.
The DESCRIPTION file of a bundle has a `Bundle' field and no `Package' field, as in
Bundle: VR Priority: recommended Contains: MASS class nnet spatial Version: 7.2-12 Date: 2005-01-31 Depends: R (>= 2.0.0), graphics, stats Suggests: lattice, nlme, survival Author: S original by Venables & Ripley. R port by Brian Ripley <ripley@stats.ox.ac.uk>, following earlier work by Kurt Hornik and Albrecht Gebhardt. Maintainer: Brian Ripley <ripley@stats.ox.ac.uk> BundleDescription: Functions and datasets to support Venables and Ripley, `Modern Applied Statistics with S' (4th edition). License: GPL (version 2 or later) See file LICENCE. URL: http://www.stats.ox.ac.uk/pub/MASS4/
The `Contains' field lists the packages (space separated), which should be contained in separate subdirectories with the names given. During building and installation, packages will be installed in the order specified. Be sure to order this list so that dependencies are met appropriately.
The packages contained in a bundle are standard packages in all respects except that the DESCRIPTION file is replaced by a DESCRIPTION.in file which just contains fields additional to the DESCRIPTION file of the bundle, for example
Package: spatial Description: Functions for kriging and point pattern analysis. Title: Functions for Kriging and Point Pattern Analysis
Any files in the package bundle except the DESCRIPTION file and the named packages will be ignored.
The `Depends' field in the bundle's DESCRIPTION file should list the dependencies of all the constituent packages (and similarly for `Imports' and `Suggests'), and then DESCRIPTION.in files should not contain these fields.
Note that most of this section is Unix-specific: see the comments later on about the Windows port of R.
If your package needs some system-dependent configuration before
installation you can include a (Bourne shell) script configure in
your package which (if present) is executed by R CMD INSTALL
before any other action is performed. This can be a script created by
the Autoconf mechanism, but may also be a script written by yourself.
Use this to detect if any nonstandard libraries are present such that
corresponding code in the package can be disabled at install time rather
than giving error messages when the package is compiled or used. To
summarize, the full power of Autoconf is available for your extension
package (including variable substitution, searching for libraries,
etc.).
The (Bourne shell) script cleanup is executed as last thing by
R CMD INSTALL if present and option --clean was given,
and by R CMD build when preparing the package for building from
its source. It can be used to clean up the package source tree. In
particular, it should remove all files created by configure.
As an example consider we want to use functionality provided by a (C or
FORTRAN) library foo. Using Autoconf, we can create a configure
script which checks for the library, sets variable HAVE_FOO to
TRUE if it was found and with FALSE otherwise, and then
substitutes this value into output files (by replacing instances of
`@HAVE_FOO@' in input files with the value of HAVE_FOO).
For example, if a function named bar is to be made available by
linking against library foo (i.e., using -lfoo), one
could use
AC_CHECK_LIB(foo, fun, [HAVE_FOO=TRUE], [HAVE_FOO=FALSE])
AC_SUBST(HAVE_FOO)
......
AC_CONFIG_FILES([foo.R])
AC_OUTPUT
in configure.ac (assuming Autoconf 2.50 or better).
The definition of the respective R function in foo.R.in could be
foo <- function(x) {
if(!@HAVE_FOO@)
stop("Sorry, library 'foo' is not available"))
...
From this file configure creates the actual R source file foo.R looking like
foo <- function(x) {
if(!FALSE)
stop("Sorry, library 'foo' is not available"))
...
if library foo was not found (with the desired functionality).
In this case, the above R code effectively disables the function.
One could also use different file fragments for available and missing functionality, respectively.
You will very likely need to ensure that the same C compiler and compiler flags are used in the configure tests as when compiling R or your package. Under Unix, you can achieve this by including the following fragment early in configure.ac
: ${R_HOME=`R RHOME`}
if test -z "${R_HOME}"; then
echo "could not determine R_HOME"
exit 1
fi
CC=`"${R_HOME}/bin/R" CMD config CC`
CFLAGS=`"${R_HOME}/bin/R" CMD config CFLAGS`
CPPFLAGS=`"${R_HOME}/bin/R" CMD config CPPFLAGS`
(using `${R_HOME}/bin/R' rather than just `R' is necessary
in order to use the `right' version of R when running the script as
part of R CMD INSTALL.)
Note that earlier versions of this document recommended obtaining the
configure information by direct extraction (using grep and sed) from
R_HOME/etc/Makeconf, which only works for variables
recorded there as literals. You can use R CMD config for getting
the value of the basic configuration variables, or the header and
library flags necessary for linking against R, see R CMD config
--help for details.
To check for an external BLAS library using the ACX_BLAS macro
from the official Autoconf Macro Archive, one can simply do
F77=`"${R_HOME}/bin/R" CMD config F77`
AC_PROG_F77
FLIBS=`"${R_HOME}/bin/R" CMD config FLIBS`
ACX_BLAS([], AC_MSG_ERROR([could not find your BLAS library], 1))
Note that FLIBS as determined by R must be used to ensure that
FORTRAN 77 code works on all R platforms. Calls to the Autoconf macro
AC_F77_LIBRARY_LDFLAGS, which would overwrite FLIBS, must
not be used (and hence e.g. removed from ACX_BLAS). (Recent
versions of Autoconf in fact allow an already set FLIBS to
override the test for the FORTRAN linker flags. Also, recent versions
of R can detect external BLAS and LAPACK libraries.)
You should bear in mind that the configure script may well not work on Windows systems (this seems normally to be the case for those generated by Autoconf, although simple shell scripts do work). If your package is to be made publicly available, please give enough information for a user on a non-Unix platform to configure it manually, or provide a configure.win script to be used on that platform.
In some rare circumstances, the configuration and cleanup scripts need to know the location into which the package is being installed. An example of this is a package that uses C code and creates two shared object/DLLs. Usually, the object that is dynamically loaded by R is linked against the second, dependent, object. On some systems, we can add the location of this dependent object to the object that is dynamically loaded by R. This means that each user does not have to set the value of the LD_LIBRARY_PATH (or equivalent) environment variable, but that the secondary object is automatically resolved. Another example is when a package installs support files that are required at run time, and their location is substituted into an R data structure at installation time. (This happens with the Java Archive files in the SJava package.) The names of the top-level library directory (i.e., specifiable via the `-l' argument) and the directory of the package itself are made available to the installation scripts via the two shell/environment variables R_LIBRARY_DIR and R_PACKAGE_DIR. Additionally, the name of the package (e.g., `survival' or `MASS') being installed is available from the shell variable R_PACKAGE_NAME.
Sometimes writing your own configure script can be avoided by supplying a file Makevars: also one of the commonest uses of a configure script is to make Makevars from Makevars.in.
The most common use of a Makevars file is to set additional
preprocessor (for example include paths) flags via PKG_CPPFLAGS,
and additional compiler flags by setting PKG_CFLAGS,
PKG_CXXFLAGS and PKG_FFLAGS, for C, C++, or FORTRAN
respectively (see Creating shared objects).
Also, Makevars can be used to set flags for the linker, for example `-L' and `-l' options.
When writing a Makevars file for a package you intend to distribute, take care to ensure that it is not specific to your compiler: flags such as -O2 -Wall -pedantic are all specific to GCC.
There are some macros which are built whilst configuring the building of R itself, are stored on Unix-alikes in R_HOME/etc/Makeconf and can be used in Makevars. These include
FLIBSPKG_LIBS.
BLAS_LIBSPKG_LIBS. Beware that if it is empty then
the R executable will contain all the double-precision and
double-complex BLAS routines, but no single-precision or complex
routines. If BLAS_LIBS is included, then FLIBS also needs
to be5, as most BLAS libraries are written in FORTRAN.
LAPACK_LIBSPKG_LIBS. This may point to a dynamic library libRlapack
which contains all the double-precision LAPACK routines as well as those
double-complex LAPACK and BLAS routines needed to build R, or it
may point to an external LAPACK library, or may be empty if an external
BLAS library also contains LAPACK.
[There is no guarantee that the LAPACK library will provide more than all the double-precision and double-complex routines, and some do not provide all the auxiliary routines.]
The macros BLAS_LIBS and FLIBS should always be included
after LAPACK_LIBS.
SAFE_FFLAGSNote that Makevars should not normally contain targets, as it is included before the default Makefile and make is called without an explicit target. To circumvent that, use a suitable phony target before any actual targets: for example fastICA has
SLAMC_FFLAGS=$(R_XTRA_FFLAGS) $(FPICFLAGS) $(SHLIB_FFLAGS) $(SAFE_FFLAGS)
all: $(SHLIB)
slamc.o: slamc.f
$(F77) $(SLAMC_FFLAGS) -c -o slamc.o slamc.f
to ensure that the LAPACK routines find some constants without infinite looping.
It may be helpful to give an extended example of using a configure script to create a src/Makevars file: this is based on that in the RODBC package.
The configure.ac file follows: configure is created from this by running autoconf in the top-level package directory (containing configure.ac).
AC_INIT([RODBC], 1.1.8) dnl package name, version
dnl A user-specifiable option
odbc_mgr=""
AC_ARG_WITH([odbc-manager],
AC_HELP_STRING([--with-odbc-manager=MGR],
[specify the ODBC manager, e.g. odbc or iodbc]),
[odbc_mgr=$withval])
if test "$odbc_mgr" = "odbc" ; then
AC_PATH_PROGS(ODBC_CONFIG, odbc_config)
fi
dnl Select an optional include path, from a configure option
dnl or from an environment variable.
AC_ARG_WITH([odbc-include],
AC_HELP_STRING([--with-odbc-include=INCLUDE_PATH],
[the location of ODBC header files]),
[odbc_include_path=$withval])
RODBC_CPPFLAGS="-I."
if test [ -n "$odbc_include_path" ] ; then
RODBC_CPPFLAGS="-I. -I${odbc_include_path}"
else
if test [ -n "${ODBC_INCLUDE}" ] ; then
RODBC_CPPFLAGS="-I. -I${ODBC_INCLUDE}"
fi
fi
dnl ditto for a library path
AC_ARG_WITH([odbc-lib],
AC_HELP_STRING([--with-odbc-lib=LIB_PATH],
[the location of ODBC libraries]),
[odbc_lib_path=$withval])
if test [ -n "$odbc_lib_path" ] ; then
LIBS="-L$odbc_lib_path ${LIBS}"
else
if test [ -n "${ODBC_LIBS}" ] ; then
LIBS="-L${ODBC_LIBS} ${LIBS}"
else
if test -n "${ODBC_CONFIG}"; then
odbc_lib_path=`odbc_config --libs | sed s/-lodbc//`
LIBS="${odbc_lib_path} ${LIBS}"
fi
fi
fi
dnl Now find the compiler and compiler flags to use
: ${R_HOME=`R RHOME`}
if test -z "${R_HOME}"; then
echo "could not determine R_HOME"
exit 1
fi
CC=`"${R_HOME}/bin/R" CMD config CC`
CFLAGS=`"${R_HOME}/bin/R" CMD config CFLAGS`
CPPFLAGS=`"${R_HOME}/bin/R" CMD config CPPFLAGS`
if test -n "${ODBC_CONFIG}"; then
RODBC_CPPFLAGS=`odbc_config --cflags`
fi
CPPFLAGS="${CPPFLAGS} ${RODBC_CPPFLAGS}"
dnl Check the headers can be found
AC_CHECK_HEADERS(sql.h sqlext.h)
if test "${ac_cv_header_sql_h}" = no ||
test "${ac_cv_header_sqlext_h}" = no; then
AC_MSG_ERROR("ODBC headers sql.h and sqlext.h not found")
fi
dnl search for a library containing an ODBC function
if test [ -n "${odbc_mgr}" ] ; then
AC_SEARCH_LIBS(SQLTables, ${odbc_mgr}, ,
AC_MSG_ERROR("ODBC driver manager ${odbc_mgr} not found"))
else
AC_SEARCH_LIBS(SQLTables, odbc odbc32 iodbc, ,
AC_MSG_ERROR("no ODBC driver manager found"))
fi
dnl for 64-bit ODBC need SQL[U]LEN, and it is unclear where they are defined.
AC_CHECK_TYPES([SQLLEN, SQLULEN], , , [# include <sql.h>])
dnl for unixODBC header
AC_CHECK_SIZEOF(long, 4)
dnl substitute RODBC_CPPFLAGS and LIBS
AC_SUBST(RODBC_CPPFLAGS)
AC_SUBST(LIBS)
AC_CONFIG_HEADERS([src/config.h])
dnl and do subsitution in the src/Makevars.in and src/config.h
AC_CONFIG_FILES([src/Makevars])
AC_OUTPUT
where src/Makevars.in would be simply
PKG_CPPFLAGS = @RODBC_CPPFLAGS@
PKG_LIBS = @LIBS@
A user can then be advised to specify the location of the ODBC driver manager files by options like (lines broken for easier reading)
R CMD INSTALL
--configure-args='--with-odbc-include=/opt/local/include
--with-odbc-lib=/opt/local/lib --with-odbc-manager=iodbc'
RODBC
or by setting the environment variables ODBC_INCLUDE and
ODBC_LIBS.
R currently does not distinguish between FORTRAN 77 and Fortran 90/95
code, and assumes all FORTRAN comes in source files with extension
.f. Commercial Unix systems typically use a F95 compiler, but
only since the release of gcc 4.0.0 in April 2005 have Linux and
other non-commercial OSes had much support for F95. The compiler used
for R on Windows is a F77 compiler.
This means that portable packages need to be written in correct FORTRAN 77, which will also be valid Fortran 95. See http://developer.r-project.org/Portability.html for reference resources. In particular, free source form F95 code is not portable.
On some systems an alternative F95 compiler is available: from the
gcc family this might be gfortran or g95.
Configuring R will try to find a compiler which (from its name)
appears to be a Fortran 90/95 compiler, and set it in macro `FC'.
Note that it does not check that such a compiler is fully (or even
partially) compliant with Fortran 90/95. Packages making use of
Fortran 90/95 features should use file extension .f90 or
.f95 for the source files: the variable PKG_FCFLAGS
specifies any special flags to be used. There is no guarantee that
compiled Fortran 90/95 code can be mixed with any other type of code,
nor that a build of R will have support for such packages.
There is a MinGW build of gfortran available from
http://gcc.gnu.org/wiki/GFortranBinaries and a MinGW
build6 of g95 from http://www.g95.org.
Set F95 in MkRules to point to the installed compiler.
Then R CMD SHLIB and R CMD INSTALL work for
packages containing Fortran 90/95 source code.
Before using these tools, please check that your package can be
installed and loaded. R CMD check will inter alia do
this, but you will get more informative error messages doing the checks
directly.
Using R CMD check, the R package checker, one can test whether
source R packages work correctly. It can be run on one or
more directories, or gzipped package tar
archives7 with extension
.tar.gz or .tgz. This runs a series of checks, including
library. Another check is that all packages mentioned in
library or requires or from which the NAMESPACE
file imports or are called via :: or ::: are listed
(in `Depends', `Imports', `Suggests' or `Contains'):
this is not an exhaustive check of the actual imports.
To allow a configure script to generate suitable files, files ending in `.in' will be allowed in the R directory.
library.dynam (with
no extension). In addition, it is checked whether methods have all
arguments of the corresponding generic, and whether the final argument
of replacement functions is called `value'. All foreign function
calls (.C, .Fortran, .Call and .External
calls) are tested to see if they have a PACKAGE argument, and if
not, whether the appropriate DLL might be deduced from the name space of
the package. Any other calls are reported. (The check is generous, and
users may want to supplement this by examining the output of
tools::checkFF("mypkg", verbose=TRUE), especially if the
intention were to always use a PACKAGE argument)
\name, \alias, \title,
\description and \keyword) fields. The Rd name and title
are checked for being non-empty, and the keywords found are compared to
the standard ones. There is a check for missing cross-references
(links).
\usage
sections of Rd files are documented in the corresponding
\arguments section.
\examples to create executable example code.)
Of course, released packages should be able to run at least their own
examples. Each example is run in a `clean' environment (so earlier
examples cannot be assumed to have been run), and with the variables
T and F redefined to generate an error unless they are set
in the example: See Logical vectors.
Use R CMD check --help to obtain more information about the usage of the R package checker. A subset of the checking steps can be selected by adding flags.
Using R CMD build, the R package builder, one can build R
packages from their sources (for example, for subsequent release).
Prior to actually building the package in the common gzipped tar file format, a few diagnostic checks and cleanups are performed. In particular, it is tested whether object indices exist and can be assumed to be up-to-date, and C, C++ and FORTRAN source files are tested and converted to LF line-endings if necessary.
Run-time checks whether the package works correctly should be performed
using R CMD check prior to invoking the build procedure.
To exclude files from being put into the package, one can specify a list
of exclude patterns in file .Rbuildignore in the top-level source
directory. These patterns should be Perl regexps, one per line, to be
matched against the file names relative to the top-level source
directory. In addition, directories called CVS or .svn or
.arch-ids and files GNUMakefile or with base names
starting with `.#', or starting and ending with `#', or ending
in `~', `.bak' or `.swp', are excluded by default. In
addition, those files in the R, demo and man
directories which are flagged by R CMD check as having invalid
names will be excluded.
Use R CMD build --help to obtain more information about the usage of the R package builder.
Unless R CMD build is invoked with the --no-vignettes option, it will attempt to rebuild the vignettes (see Writing package vignettes) in the package. To do so it installs the current package/bundle into a temporary library tree, but any dependent packages need to be installed in an available library tree (see the Note: below).
One of the checks that R CMD build runs is for empty source
directories. These are in most cases unintentional, in which case they
should be removed and the build re-run.
It can be useful to run R CMD check --check-subdirs=yes on the
built tarball as a final check on the contents.
R CMD build can also build pre-compiled version of packages for
binary distributions, but R CMD INSTALL --build is preferred (and
is considerably more flexible). In particular, Windows users are
recommended to use R CMD INSTALL --build and install into the
main library tree (the default) so that HTML links are resolved.
Note:R CMD checkandR CMD buildrun R with --vanilla, so none of the user's startup files are read. If you need R_LIBS set (to find packages in a non-standard library) you will need to set it in the environment.
Note to Windows users:R CMD checkandR CMD buildwork well under Windows NT4/2000/XP/2003 but may not work correctly on Windows 95/98/ME because of problems with some versions of Perl on those limited OSes. Experiences vary. To use them you will need to have installed the files for building source packages (which is the default).
In addition to the available command line options, R CMD check
also allows customization by setting (Perl) configuration variables in a
configuration file, the location of which can be specified via the
--rcfile option and defaults to $HOME/.R/check.conf
provided that the environment variable HOME is set.
The following configuration variables are currently available.
$R_check_use_install_log$R_check_all_non_ISO_C$R_check_weave_vignettes$R_check_subdirs_nocase$R_check_subdirs_strict$R_check_force_suggestsValues `1' or a string with lower-cased version `"yes"' or `"true"' can be used for setting the variables to true; similarly, `0' or strings with lower-cased version `"no"' or `"false"' give false.
For example, a configuration file containing
$R_check_use_install_log = "TRUE";
$R_check_weave_vignettes = 0;
results in using install logs and turning off weaving.
Future versions of R will enhance this customization mechanism, and
provide a similar scheme for R CMD build.
There are other internal settings that can be changed via environment
variables _R_CHECK_*_: see the Perl source code. One that
may be interesting is _R_CHECK_USE_CODETOOLS_ to make use of the
codetools package available from
http://www.stat.uiowa.edu/~luke/R/codetools.
In addition to the help files in Rd format, R packages allow the inclusion of documents in arbitrary other formats. The standard location for these is subdirectory inst/doc of a source package, the contents will be copied to subdirectory doc when the package is installed. Pointers from package help indices to the installed documents are automatically created. Documents in inst/doc can be in arbitrary format, however we strongly recommend to provide them in PDF format, such that users on all platforms can easily read them. To ensure that they can be accessed from a browser, the file names should start with an ASCII letter and be entirely in ASCII letters or digits or minus or underscore.
A special case are documents in Sweave format, which we call
package vignettes. Sweave allows the integration of LaTeX
documents and R code and is contained in package utils which is
part of the base R distribution, see the Sweave help page for
details on the document format. Package vignettes found in directory
inst/doc are tested by R CMD check by executing all R
code chunks they contain to ensure consistency between code and
documentation. Code chunks with option eval=FALSE are not
tested. The R working directory for all vignette tests in R CMD
check is the installed version of the doc
subdirectory. Make sure all files needed by the vignette (data sets,
...) are accessible by either placing them in the inst/doc
hierarchy of the source package, or using calls to system.file().
R CMD build will automatically create PDF versions of the
vignettes for distribution with the package sources. By including the
PDF version in the package sources it is not necessary that the
vignettes can be compiled at install time, i.e., the package author can
use private LaTeX extensions which are only available on his machine.
Only the R code inside the vignettes is part of the checking
procedure, typesetting manuals is not part of the package quality
control.
By default R CMD build will run Sweave on all files in
Sweave format. If no Makefile is found in directory
inst/doc, then texi2dvi --pdf is run on all vignettes.
Whenever a Makefile is found, then R CMD build will try to
run make after the Sweave step, such that PDF manuals
can be created from arbitrary source formats (plain LaTeX files,
...). The Makefile should take care of both creation of PDF
files and cleaning up afterwards, i.e., delete all files that shall not
appear in the final package archive. Note that the make step is
executed independently from the presence of any files in Sweave format.
It is no longer necessary to provide a 00Index.dcf file in the
inst/doc directory—the corresponding information is generated
automatically from the \VignetteIndexEntry statements in all
Sweave files when installing from source, or when using the package
builder (see Checking and building packages). The
\VignetteIndexEntry statement is best placed in LaTeX comment,
such that no definition of the command is necessary.
At install time an HTML index for all vignettes is automatically
created from the \VignetteIndexEntry statements unless a file
index.html exists in directory inst/doc. This index is
linked into the HTML help system for each package.
CRAN is a network of WWW sites holding the R distributions and contributed code, especially R packages. Users of R are encouraged to join in the collaborative project and to submit their own packages to CRAN.
Before submitting a package mypkg, do run the following steps to test it is complete and will install properly. (Unix procedures only, run from the directory containing mypkg as a subdirectory.)
R CMD check to check that the package will install and will
runs its examples, and that the documentation is complete and can be
processed. If the package contains code that needs to be compiled, try
to enable a reasonable amount of diagnostic messaging (“warnings”)
when compiling, such as e.g. -Wall -pedantic for tools from
GCC, the Gnu Compiler Collection. (If R was not configured
accordingly, one can achieve this e.g. via PKG_CFLAGS and
related variables.)
R CMD build to make the release .tar.gz file.
Please ensure that you can run through the complete procedure with only
warnings that you understand and have reasons not to eliminate. In
principle, packages must pass R CMD check without warnings to be
admitted to the main CRAN package area.
When all the testing is done, upload the .tar.gz file, using `anonymous' as log-in name and your e-mail address as password, to
ftp://cran.R-project.org/incoming/
(note: use ftp and not sftp to connect to this server) and
send a message to cran@r-project.org
about it. The CRAN maintainers will run these tests before
putting a submission in the main archive.
Note that the fully qualified name of the .tar.gz file must be of the form
package_version[_engine[_type]],
where the `[ ]' indicates that the enclosed component is optional, package and version are the corresponding entries in file DESCRIPTION, engine gives the S engine the package is targeted for and defaults to `R', and type indicated whether the file contains source or binaries for a certain platform, and defaults to `source'. I.e.,
OOP_0.1-3.tar.gz
OOP_0.1-3_R.tar.gz
OOP_0.1-3_R_source.tar.gz
are all equivalent and indicate an R source package, whereas
OOP_0.1-3_Splus6_sparc-sun-solaris.tar.gz
is a binary package for installation under Splus6 on the given platform.
This naming scheme has been adopted to ensure usability of code across S
engines. R code and utilities operating on package .tar.gz files
can only be assumed to work provided that this naming scheme is
respected. Of course, R CMD build automatically creates valid
file names.
R has a name space management system for packages. This system allows the package writer to specify which variables in the package should be exported to make them available to package users, and which variables should be imported from other packages.
The current mechanism for specifying a name space for a package is to
place a NAMESPACE file in the top level package directory. This
file contains name space directives describing the imports and
exports of the name space. Additional directives register any shared
objects to be loaded and any S3-style methods that are provided. Note
that although the file looks like R code (and often has R-style
comments) it is not processed as R code. Only very simple
conditional processing of if statements is implemented.
Like other packages, packages with name spaces are loaded and attached
to the search path by calling library. Only the exported
variables are placed in the attached frame. Loading a package that
imports variables from other packages will cause these other packages to
be loaded as well (unless they have already been loaded), but they will
not be placed on the search path by these implicit loads.
Name spaces are sealed once they are loaded. Sealing means that imports and exports cannot be changed and that internal variable bindings cannot be changed. Sealing allows a simpler implementation strategy for the name space mechanism. Sealing also allows code analysis and compilation tools to accurately identify the definition corresponding to a global variable reference in a function body.
Note that adding a name space to a package changes the search strategy. The package name space comes first in the search, then the imports, then the base name space and then the normal search path.
Exports are specified using the export directive in the
NAMESPACE file. A directive of the form
export(f, g)
specifies that the variables f and g are to be exported.
(Note that variable names may be quoted, and non-standard names such as
[<-.fractions must be.)
For packages with many variables to export it may be more convenient to
specify the names to export with a regular expression using
exportPattern. The directive
exportPattern("^[^\\.]")
exports all variables that do not start with a period.
A package with a name space implicitly imports the base name space.
Variables exported from other packages with name spaces need to be
imported explicitly using the directives import and
importFrom. The import directive imports all exported
variables from the specified package(s). Thus the directives
import(foo, bar)
specifies that all exported variables in the packages foo and
bar are to be imported. If only some of the exported variables
from a package are needed, then they can be imported using
importFrom. The directive
importFrom(foo, f, g)
specifies that the exported variables f and g of the
package foo are to be imported.
If a package only needs one (exported) object from another package it
can use a fully qualified variable reference in the code instead of a
formal import. A fully qualified reference to the function f in
package foo is of the form foo::f. This is less efficient
than a formal import and also loses the advantage of recording all
dependencies in the NAMESPACE file, so this approach is usually
not recommended. Evaluating foo::f will cause package foo
to be loaded, but not attached, if it was not loaded already.
The standard method for S3-style UseMethod dispatching might fail
to locate methods defined in a package that is imported but not attached
to the search path. To ensure that these methods are available the
packages defining the methods should ensure that the generics are
imported and register the methods using S3method directives. If
a package defines a function print.foo intended to be used as a
print method for class foo, then the directive
S3method(print, foo)
ensures that the method is registered and available for UseMethod
dispatch. The function print.foo does not need to be exported.
Since the generic print is defined in base it does not need
to be imported explicitly. This mechanism is intended for use with
generics that are defined in a name space. Any methods for a generic
defined in a package that does not use a name space should be exported,
and the package defining and exporting the methods should be attached to
the search path if the methods are to be found.
There are a number of hooks that apply to packages with name spaces.
See help(".onLoad") for more details.
Packages with name spaces do not use the .First.lib function.
Since loading and attaching are distinct operations when a name space is
used, separate hooks are provided for each. These hook functions are
called .onLoad and .onAttach. They take the same
arguments as .First.lib; they should be defined in the name space
but not exported.
However, packages with name spaces do use the .Last.lib
function. There is also a hook .onUnload which is called when
the name space is unloaded (via a call to unloadNamespace) with
argument the full path to the directory in which the package was
installed. .onUnload should be defined in the name space and not
exported, but .Last.lib does need to be exported.
Packages are not likely to need .onAttach (except perhaps for a
start-up banner); code to set options and load shared objects should be
placed in a .onLoad function, or use made of the useDynLib
directive described next.
There can be one or more useDynLib directives which allow shared
objects that need to be loaded to be specified in the NAMESPACE
file. The directive
useDynLib(foo)
registers the shared object foo for loading with
library.dynam. Loading of registered object(s) occurs after the
package code has been loaded and before running the load hook function.
Packages that would only need a load hook function to load a shared
object can use the useDynLib directive instead.
User-level hooks are also available: see the help on function
setHook.
The useDynLib directive also accepts the names of the native
routines that are to be used in R via the .C, .Call,
.Fortran and .External interface functions. These are given as
additional arguments to the directive, for example,
useDynLib(foo, myRoutine, myOtherRoutine)
By specifying these names in the useDynLib directive, the
native symbols are resolved when the package is loaded and R variables
identifying these symbols are added to the package's name space with
these names. These can be used in the .C, .Call,
.Fortran and .External calls in place of the
name of the routine and the PACKAGE argument.
For instance, we can call the routine myRoutine from R
with the code
.Call(myRoutine, x, y)
rather than
.Call("myRoutine", x, y, PACKAGE = "foo")
There are at least two benefits to this approach. Firstly, the symbol lookup is done just once for each symbol rather than each time it the routine is invoked. Secondly, this removes any ambiguity in resolving symbols that might be present in several compiled libraries. In particular, it allows for correctly resolving routines when different versions of the same package are loaded concurrently in the same R session.
In some circumstances, there will already be an R variable in the
package with the same name as a native symbol. For example, we may have
an R function in the package named myRoutine. In this case,
it is necessary to map the native symbol to a different R variable
name. This can be done in the useDynLib directive by using named
arguments. For instance, to map the native symbol name myRoutine
to the R variable myRoutine_sym, we would use
useDynLib(foo, myRoutine_sym = myRoutine, myOtherRoutine)
We could then call that routine from R using the command
.Call(myRoutine_sym, x, y)
Symbols without explicit names are assigned to the R variable with that name.
In some cases, it may be preferable not to create R variables in the
package's name space that identify the native routines. It may be too
costly to compute these for many routines when the package is loaded
if many of these routines are not likely to be used. In this case,
one can still perform the symbol resolution correctly using the DLL,
but do this each time the routine is called. Given a reference to the
DLL as an R variable, say dll, we can call the routine
myRoutine using the expression
.Call(dll$myRoutine, x, y)
The $ operator resolves the routine with the given name in the
DLL using a call to getNativeSymbol. This is the same
computation as above where we resolve the symbol when the package is
loaded. The only difference is that this is done each time in the case
of dll$myRoutine.
In order to use this dynamic approach (e.g., dll$myRoutine), one
needs the reference to the DLL as an R variable in the package. The
DLL can be assigned to a variable by using the variable =
dllName format used above for mapping symbols to R variables. For
example, if we wanted to assign the DLL reference for the DLL
foo in the example above to the variable myDLL, we would
use the following directive in the NAMESPACE file:
myDLL = useDynLib(foo, myRoutine_sym = myRoutine, myOtherRoutine)
Then, the R variable myDLL is in the package's name space and
available for calls such as myDLL$dynRoutine to access routines
that are not explicitly resolved at load time.
If the package has registration information (see Registering native routines), then we can use that directly rather than specifying the
list of symbols again in the useDynLib directive in the
NAMESPACE file. Each routine in the registration information is
specified by giving a name by which the routine is to be specified along
with the address of the routine and any information about the number and
type of the parameters. Using the .registration argument of
useDynLib, we can instruct the name space mechanism to create
R variables for these symbols. For example, suppose we have the
following registration information for a DLL named myDLL:
R_CMethodDef cMethods[] = {
{"foo", &foo, 4, {REALSXP, INTSXP, STRSXP, LGLSXP}},
{"bar_sym", &bar, 0},
{NULL, NULL, 0}
};
R_CallMethodDef callMethods[] = {
{"R_call_sym", &R_call, 4},
{"R_version_sym", &R_version, 0},
{NULL, NULL, 0}
};
Then, the directive in the NAMESPACE file
useDynLib(myDLL, .registration = TRUE)
causes the DLL to be loaded and also for the R variables foo,
bar_sym, R_call_sym and R_version_sym to be
defined in the package's name space.
Note that the names for the R variables are taken from the entry in
the registration information and do not need to be the same as the name
of the native routine. This allows the creator of the registration
information to map the native symbols to non-conflicting variable names
in R, e.g. R_version to R_version_sym for use in an
R function such as
R_version <- function()
{
.Call(R_version_sym)
}
Using argument .fixes allows an automatic prefix to be added to
the registered symbols, which can be useful when working with an
existing package. For example, package KernSmooth has
useDynLib(KernSmooth, .registration = TRUE, .fixes = "F_")
which makes the R variables corresponding to the Fortran symbols
F_bkde and so on, and so avoid clashes with R code in the name
space.
More information about this symbol lookup, along with some approaches for customizing it, is available from http://www.omegahat.org/examples/RDotCall.
As an example consider two packages named foo and bar. The R code for package foo in file foo.R is
x <- 1 f <- function(y) c(x,y) foo <- function(x) .Call("foo", x, PACKAGE="foo") print.foo <- function(x, ...) cat("<a foo>\n")
Some C code defines a C function compiled into DLL foo (with an
appropriate extension). The NAMESPACE file for this package is
useDynLib(foo) export(f, foo) S3method(print, foo)
The second package bar has code file bar.R
c <- function(...) sum(...) g <- function(y) f(c(y, 7)) h <- function(y) y+9
and NAMESPACE file
import(foo) export(g, h)
Calling library(bar) loads bar and attaches its exports to
the search path. Package foo is also loaded but not attached to
the search path. A call to g produces
> g(6)
[1] 1 13
This is consistent with the definitions of c in the two settings:
in bar the function c is defined to be equivalent to
sum, but in foo the variable c refers to the
standard function c in base.
To summarize, converting an existing package to use a name space involves several simple steps:
export directives.
S3method declarations.
require calls by
import directives.
.First.lib functions with .onLoad functions or
useDynLib directives.
Some code analysis tools to aid in this process are currently under development.
Some additional steps are needed for packages which make use of formal
(S4-style) classes and methods (unless these are purely used
internally). There needs to be an .onLoad action to
ensure that the methods package is loaded and attached:
.onLoad <- function(lib, pkg) require(methods)
and any classes and methods which are to be exported need to be declared as such in the NAMESPACE file. For example, the now-defunct mle package had
importFrom(graphics, plot)
importFrom(stats, profile, confint)
exportClasses("mle", "profile.mle", "summary.mle")
exportMethods("confint", "plot", "profile", "summary", "show")
All formal classes need to be listed in an exportClasses
directive. All generics for which formal methods are defined need to be
declared in an exportMethods directive, and where the generics
are formed by taking over existing functions, those functions need to be
imported (explicitly unless they are defined in the base
name space).
In addition, a package using classes and methods defined in another package needs to import them, with directives
importClassesFrom(package, ...)
importMethodsFrom(package, ...)
listing the classes and functions with methods respectively. Suppose we
had two small packages A and B with B using A.
Then they could have NAMESPACE files
export(f1, ng1) exportMethods("[") exportClasses(c1)
and
importFrom(A, ng1) importClassesFrom(A, c1) importMethodsFrom(A, f1) export(f4, f5) exportMethods(f6, "[") exportClasses(c1, c2)
respectively.
R CMD check provides a basic set of checks, but often further
problems emerge when people try to install and use packages submitted to
CRAN – many of these involve compiled code. Here are some
further checks that you can do to make your package more portable.
gcc can be used
with options -Wall -pedantic to alert you to potential
problems. Do not be tempted to assume that these are pure pedantry: for
example R is regularly used on platforms where the C compiler does
not accept C++ comments.
long in C will be 32-bit
on most R platforms (including those mostly used by the
CRAN maintainers), but 64-bit on many modern Unix and Linux
platforms. It is rather unlikely that the use of long in C code
has been thought through: if you need a longer type than int you
should use a configure test for a C99 type such as int_fast64_t
(and failing that, long long) and typedef your own type to be
long or long long, or use another suitable type (such as
size_t). Note that integer in FORTRAN
corresponds to int in C on all R platforms.
extern in all but one of the files.
nm -pg mypkg.so # or other extension such as .sl or .dylib
and checking if any of the symbols marked U is unexpected is a
good way to avoid this.
nm -pg), and to use unusual names, as
well as ensuring you have used the PACKAGE argument that R
CMD check checks for.
Now that diagnostic messages can be made available for translation, it is important to write them in a consistent style. Using the tools described in the next section to extract all the messages can give a useful overview of your consistency (or lack of it).
Some guidelines follow.
In R error messages do not construct a message with paste (such
messages will not be translated) but via multiple arguments to
stop or warning, or via gettextf.
sQuote or dQuote except where the argument is a
variable.
Conventionally single quotation marks are used for quotations such as
'ord' must be a positive integer, at most the number of knots
and double quotation marks when referring to an R character string such as
'format' must be "normal" or "short" - using "normal"
Since ASCII does not contain directional quotation marks, it
is best to use `'' and let the translator (including automatic
translation) use directional quotations where available. The range of
quotation styles is immense: unfortunately we cannot reproduce them in a
portable texinfo document. But as a taster, some languages use
`up' and `down' (comma) quotes rather than left or right quotes, and
some use guillemets (and some use what Adobe calls `guillemotleft' to
start and others use it to end).
library
if((length(nopkgs) > 0) && !missing(lib.loc)) {
if(length(nopkgs) > 1)
warning("libraries ",
paste(sQuote(nopkgs), collapse = ", "),
" contain no packages")
else
warning("library ", paste(sQuote(nopkgs)),
" contains no package")
}
and was replaced by
if((length(nopkgs) > 0) && !missing(lib.loc)) {
pkglist <- paste(sQuote(nopkgs), collapse = ", ")
msg <- sprintf(ngettext(length(nopkgs),
"library %s contains no packages",
"libraries %s contain no packages"),
pkglist)
warning(msg, domain=NA)
}
Note that it is much better to have complete clauses as here, since in another language one might need to say `There is no package in library %s' or `There are no packages in libraries %s'.
There are mechanisms to translate the R- and C-level error and warning messages. There are only available if R is compiled with NLS support (which is requested by configure option --enable-nls, the default).
The procedures make use of msgfmt and xgettext which are
part of GNU gettext and this will need to be installed:
Windows users can find pre-compiled binaries at the GNU
archive mirrors and packaged with the poEdit package
(http://poedit.sourceforge.net/download.php#win32).
The process of enabling translations is
#include <R.h> /* to include Rconfig.h */
#ifdef ENABLE_NLS
#include <libintl.h>
#define _(String) dgettext ("pkg", String)
/* replace pkg as appropriate */
#else
#define _(String) (String)
#endif
_(...),
for example
error(_("'ord' must be a positive integer"));
xgettext --keyword=_ -o pkg.pot *.c
The file src/pkg.pot is the template file, and
conventionally this is shipped as po/pkg.pot. A translator
to another language makes a copy of this file and edits it (see the
gettext manual) to produce say ll.po, where ll
is the code for the language in which the translation is to be used.
(This file would be shipped in the po directory.) Next run
msgfmt on ll.po to produce ll.mo, and
copy that to inst/po/ll/LC_MESSAGES/pkg.mo. Now when
the package is loaded after installation it will look for translations
of its messages in the po/lang/LC_MESSAGES/pkg.mo file
for any language lang that matches the user's preferences (via the
setting of the LANGUAGE environment variable or from the locale
settings).
Mechanisms to support the automatic translation of R stop,
warning and message messages are in place, provided the
package has a name space. They make use of message catalogs in the same
way as C-level messages, but using domain R-pkg rather than
pkg. Translation of character strings inside stop,
warning and message calls is automatically enabled, as
well as other messages enclosed in calls to gettext or
gettextf. (To suppress this, use argument domain=NA.)
Tools to prepare the R-pkg.pot file are provided in package
tools: xgettext2pot will prepare a file from all strings
occurring inside gettext/gettextf, stop,
warning and message calls. Some of these are likely to be
spurious and so the file is likely to need manual editing.
xgettext extracts the actual calls and so is more useful when
tidying up error messages.
Translation of messages which might be singular or plural can be very
intricate: languages can have up to four different forms. The R
function ngettext provides an interface to the C function of the
same name, and will choose an appropriate singular or plural form for
the selected language depending on the value of its first argument
n.
Packages without name spaces will need to use domain="R-pkg"
explicitly in calls to stop, warning, message,
gettext/gettextf and ngettext.
The DESCRIPTION file has an optional field Type which if
missing is assumed to be Package, the sort of extension discussed
so far in this chapter. Currently two other types are recognized, both
of which need write permission in the R installation tree.
This is a rather general mechanism, designed for adding new front-ends
such as the gnomeGUI package. If a configure file is found
in the top-level directory of the package it is executed, and then if a
Makefile is found (often generated by configure),
make is called. If R CMD INSTALL --clean is used
make clean is called. No other action is taken.
R CMD build can package up this type of extension, but R
CMD check will check the type and skip it.
Conventionally, a translation package for language ll is called
Translation-ll and has Type: Translation. It needs
to contain the directories share/locale/ll and
library/pkgname/po/ll, or at least those for
which translations are available. The files .mo are installed in
the parallel places in the R installation tree.
For example, a package Translation-it might be prepared from an installed (and tested) version of R by
mkdir Translation-it
cd Translation-it
(cd $R_HOME; tar cf - share/locale/it library/*/po/it) | tar xf -
# the next step is not needed on Windows
msgfmt -c -o share/locale/it/LC_MESSAGES/RGui.mo $R_SRC_HOME/po/RGui-it.gmo
# create a DESCRIPTION file
cd ..
R CMD build Translation-it
It is probably appropriate to give the package a version number based on the version of R which has been translated. So the DESCRIPTION file might look like
Package: Translation-it
Type: Translation
Version: 2.2.1-1
Title: Italian Translations for R 2.2.1
Description: Italian Translations for R 2.2.1
Author: The translators
Maintainer: Some Body <somebody@some.where.net>
Licence: GPL Version 2 or later.