3. Revision history¶
This page summarizes some of the more important changes between releases.
3.1. Current version¶
3.1.1. v2.4.26¶
Released: March 14, 2017
- starchstrip
- New utility to efficiently filter a Starch archive, including or excluding records by specified chromosome names, without doing expensive extraction and recompression. This follows up on internal discussion on the Altius Slack channel.
- starch-diff
- Fixed testing logic in
starch-difffor certain archives. Thanks to Shane Neph for the report.
- Fixed testing logic in
- starchcat
- Fixed possible condition where too many variables on the stack can cause a stack overload on some platforms, leading to a fatal segmentation fault. Improved logic for updating v2.1 to v2.2 Starch archives.
- Starch C++ API
- Patched gzip-backed Starch archive extraction issue. Thanks to Matt Maurano for the bug report.
- update-sort-bed-migrate-candidates
- Added detailed logging via
--debugoption. - Added
--bedops-root-diroption to allow specifying where all BEDOPS binaries are stored. This setting can be overruled on a per-binary basis by adding--bedextract-path,--sort-bed-path, etc. - Added
--non-recursive-searchoption to restrict search for BED and Starch candidates to the top-level of the specified parent directory--parent-diroption. - Further simplification and customization of parameters sent to
update-sort-bed-slurmandupdate-sort-bed-starch-slurmcluster scripts, as well as logging and variable name improvements to those two scripts. - Thanks again to Matt Maurano for ongoing feedback and suggestions on functionality and fixes.
- Added detailed logging via
- gtf2bed
- Resolved segmentation fault with certain inputs, in follow-up to this BEDOPS Forum post. Thanks to zebasilio for the report and feedback.
3.2. Previous versions¶
3.2.1. v2.4.25¶
Released: February 15, 2017
-
- Patch for RepeatMasker inputs with blank lines that have no spaces. This follows up on Issue 173. Thanks to saketkc for the bug report.
update-sort-bed-migrate-candidates
The
update-sort-bed-migrate-candidatesutility recursively searches into the specified directory for BED and Starch files which fail asort-bed --check-sorttest. Those files which fail this test can have their paths written to a text file for further downstream processing, or the end user can decide to apply an immediate resort on those files, either locally or via a SLURM-managed cluster. Grateful thanks to Matt Maurano for input and testing.See
update-sort-bed-migrate-candidates --helpfor more information, or review the sort-bed documentation.-
This is an adjunct to the
update-sort-bed-slurmutility, which resorts the provided Starch file and writes a new file. (Theupdate-sort-bed-slurmutility only takes in BED files as input and writes BED as output.)
3.2.2. v2.4.24¶
Released: February 6, 2017
- starch-diff
- The
starch-diffutility compares signatures of two or more v2.2+ Starch archives. This tool tests all chromosomes or one specified chromosome. It returns a zero exit code, if the signature(s) are identical, or a non-zero error exit code, if one or more signature(s) are dissimilar.
- The
- update-sort-bed-slurm
- The
update-sort-bed-slurmconvenience utility provides a parallelized update of the sort order on BED files sorted with pre-v2.4.20 sort-bed, for users with a SLURM job scheduler and associated cluster. Seeupdate-sort-bed-slurm --helpfor more details.
- The
- convert2bed
- Patched a memory leak in VCF conversion. Thanks to ehsueh for the bug report.
3.2.3. v2.4.23¶
Released: January 30, 2017
- unstarch
- Fixed bug where missing signature from pre-v2.2 Starch archives would cause a fatal metadata error. Thanks to Shane Neph and Eric Rynes for the bug report.
- Improved logic reporting signature mismatches when input v2.2 archive lacks signature (e.g., for a v2.2 archive made with
--omit-signature).
- starch and starchcat
- Added
--omit-signatureoption to compress without creating a per-chromosome data integrity signature. While this reduces compression time, this eliminates the verification benefits of the data integrity signature.
- Added
3.2.4. v2.4.22¶
Released: January 25, 2017
- convert2bed
- Fixed heap corruption in GFF conversion. Thanks to J. Miguel Mendez (ObjectiveTruth) for the bug report.
3.2.5. v2.4.21¶
Released: January 23, 2017
-
- New
--wmeanoperation offers a weighted mean calculation. The “weight” is derived from the proportion of the reference element covered by overlapping map elements: i.e., a map element that covers more of the reference element has its signal given a larger weight or greater impact than another map element with a shorter overlap. - Measurement values in
bedmapdid not allow+in the exponent (both-worked and no+for a positive value. Similarly, out in front of the number,+was previously not allowed. Shane Neph posted the report and the fix. - The
--min-elementand--max-elementoperations in bedmap now process elements in unambiguous order. Former behavior is moved to the operations--min-element-randand--max-element-rand, respectively. - Fixed issue with use of
--echo-overlap-sizewith--multidelim(cf. Issue 165). Shane Neph posted the fix. Thanks to Jeff Vierstra for the bug report!
- New
-
- Fixed issue with
--chopwhere complement operation could potentially be included. Shane Neph posted the fix. - The
bedops --everythingorbedops -u(union) operation now writes elements to standard output in unambiguous sort order. If any data are contained in fourth or subsequent fields, a lexicographical sort on that data is applied for resolving order of interval matches.
- Fixed issue with
-
Improved sort times from replacing quicksort (
std::qsort) with inlined C++std::sort.Sorting of BED input now leads to unambiguous result when two or more elements have the same genomic interval (chromosome name and start and stop position), but different content in remaining columns (ID, score, etc.).
Formerly, elements with the same genomic interval that have different content in fourth and subsequent columns could be printed in a non-consistent ordering on repeated sorts. A deterministic sort order facilitates the use of data integrity functions on sorted BED and Starch data.
-
- A SLURM-ready version of the
starchclusterscript was added to help SLURM job scheduler users with parallelizing the creation of Starch archives.
- A SLURM-ready version of the
Parallel bam2bed and bam2starch
- SLURM-ready versions of these scripts were added to help parallelize the conversion of BAM to BED files (
bam2bed_slurm) or to Starch archives (bam2starch_slurm).
- SLURM-ready versions of these scripts were added to help parallelize the conversion of BAM to BED files (
-
Added
--signatureoption to report the Base64-encoded SHA-1 data integrity signature of the Starch-transformed bytes of a specified chromosome, or to report the signature of the metadata string as well as the signatures of all chromosomes, if unspecified.Added
--verify-signatureoption to compare the “expected” Base64-encoded SHA-1 data integrity signature stored within the archive’s metadata with the “observed” data integrity signature generated from extracting the specified chromosome.If the observed and expected signatures differ, then this suggests that the chromosome record may be corrupted in some way;
unstarchwill exit with a non-zero error code. If the signatures agree, the archive data should be intact and unstarch will exit with a helpful notice and a zero error code.If no chromosome is specified,
unstarchwill loop through all chromosomes in the archive metadata, comparing observed and expected values for each chromosome record. Upon completion, error and progress messages will be reported to the standard error stream, andunstarchwill exit with a zero error code, if all signatures match, or a non-zero exit state, if one or more signatures do not agree.The output from the
--listoption includes asignaturecolumn to report the data integrity signature of all Starch-transformed chromosome data.The output from the
--list-jsonoption includes asignaturekey in each chromosome record in the archive metadata, reporting the same information.The
--is-starchoption now quits with a non-zero exit code, if the specified input file is not a Starch archive.The
--elements-max-string-lengthoption reports the length of the longest string within the specified chromosome, or the longest string over all chromosomes (if no chromosome name is specified).
-
Added
--report-progress=Noption to (optionally) report compression of the Nth element of the current chromosome to standard error stream.As a chromosome is compressed, the input Starch-transform bytes are continually run through a SHA-1 hash function. The resulting data integrity signature is stored as a Base64-encoded string in the output archive’s metadata. Signatures can be compared between and within archives to help better ensure the data integrity of the archive.
Fixed
--headertransform bug reported in Issue 161. Thanks to Shane Neph for the bug report!Added chromosome name and “remainder” order tests to
STARCH2_transformHeaderlessBEDInputandSTARCH2_transformHeaderedBEDInputfunctions.Compression with
starchends with a fatal error, should any of the following comparison tests fail:- The chromosome names are not lexicographically ordered (e.g.,
chr1records coming afterchr2records indicates the data are not correctly sorted). - The start position of an input element is less than the start position of a previous input element on the same chromosome (e.g.,
chr1:1000-1234coming afterchr1:2000-2345is not correctly sorted). - The stop positions of two or more input elements are not in ascending order when their start positions are equal (e.g.,
chr1:1000-1234coming afterchr1:1000-2345is not correctly sorted). - The start and stop positions of two or more input elements are equivalent, and their “remainders” (fourth and subsequent columns) are not in ascending order (e.g.,
chr1:1000-1234:id-0coming afterchr1:1000-1234:id-1is not correctly sorted).
If the sort order of the input data is unknown or uncertain, simply use
sort-bedto generate the correct ordering and pipe the output from that tostarch, e.g.$ cat elements.bed | sort-bed - | starch - > elements.starch.- The chromosome names are not lexicographically ordered (e.g.,
-
- Added
--report-progress=Noption to (optionally) report compression of the N th element of the current chromosome to standard error stream. - As in
starch, at the conclusion of compressing a chromosome made from one or more input Starch archives, the input Starch-transform bytes are continually run through a SHA-1 hash function. The resulting data integrity signature is stored as a Base64-encoded string in the chromosome’s entry in the new archive’s metadata. - As in
starch, if data should need to be extracted and recompressed, the output is written so that the order is unambiguous: ascending lexicographic ordering on chromosome names, numerical ordering on start positions, the same ordering on stop positions where start positions match, and ascending lexicographic ordering on the remainder of the BED element (fourth and subsequent columns, where present).
- Added
-
Improvements in support for BAM/SAM inputs with larger-sized reads, as would come from alignments made from data collected from third-generation sequencers. Simulated read datasets were generated using SimLoRD. Tests have been performed on simulated hg19 data up to 100kb read lengths.
Improvements allow:
- conversion of dynamic number of CIGAR operations (up to system memory)
- conversion of dynamically-sized read fields (up to system memory and inter-thread buffer allocations)
These patches follow up on bug reports in Issue 157.
Improvements in support for VCF inputs, to allow aribtrary-sized fields (up to system memory and inter-thread buffer allocations), which should reduce or eliminate segmentation faults from buffer overruns on fields larger than former stack defaults.
Improvements in support for GFF inputs, to allow aribtrary-sized fields (up to system memory and inter-thread buffer allocations), which should reduce or eliminate segmentation faults from buffer overruns on fields larger than former stack defaults.
Improvements in support for GTF inputs, to allow aribtrary-sized fields (up to system memory and inter-thread buffer allocations), which should reduce or eliminate segmentation faults from buffer overruns on fields larger than former stack defaults.
Testing
- Our use of Travis CI to automate testing of builds now includes Clang on their OS X environment.
3.2.6. v2.4.20¶
Released: July 27, 2016
- convert2bed
- Increased memory allocation for maximum number of per-read CIGAR operations in BAM and SAM conversion to help improve stability. Thanks to Adam Freedman for the report!
- Improved reliability of gene ID parsing from GTF input, where
gene_idfield may be positioned at start, middle, or end of attributes string, or may be empty. Thanks to blaiseli for the report!
3.2.7. v2.4.19¶
Released: May 9, 2016
- convert2bed
- Fixed bug in BAM and SAM parallel conversion scripts (
*_gnuParalleland*_sge) with inputs containing chromosome names withoutchrprefix. Thanks to Eric Haugen for the bug report!
- Fixed bug in BAM and SAM parallel conversion scripts (
- Starch C++ API
- Fixed bug with extraction of bzip2- and gzip-backed archives with all other non-primary Starch tools (all tools except
starch,unstarch,starchcat, andsort-bed). Thanks to Eric Haugen for the bug report!
- Fixed bug with extraction of bzip2- and gzip-backed archives with all other non-primary Starch tools (all tools except
3.2.8. v2.4.18¶
Released: April 28, 2016
- convert2bed
- Fixed compile warnings.
- Fixed bug in BAM and SAM conversion with optional field line overflow. Thanks to Jemma Nelson for the bug report!
- General documentation improvements
- Updated OS X Installer and Github release instructions
- Added thank-you to Feng Tian for bug report
3.2.9. v2.4.17¶
Released: April 26, 2016
- bam2bed and sam2bed
- Improved parsing of non-split BAM and SAM inputs.
- Docker container build target added for Debian
- Thanks to Leo Comitale (Poldo) for writing a Makefile target and spec for creating a BEDOPS Docker container for the Debian target.
- Starch C++ API
- Fixed bug with extraction of bzip2- and gzip-backed archives with all other non-primary Starch tools (all tools except
starch,unstarch,starchcat, andsort-bed). Thanks to Feng Tian for reports.
- Fixed bug with extraction of bzip2- and gzip-backed archives with all other non-primary Starch tools (all tools except
3.2.10. v2.4.16¶
Released: April 5, 2016
- bedmap
- Added new
--echo-ref-row-idoption to report reference row ID elements.
- Added new
- Starch C++ API
- Fixed bug with extraction of archives made with
starch --gzip(thanks to Brad Gulko for the bug report and Paul Verhoeven and Peter Weir for compile and testing assistance).
- Fixed bug with extraction of archives made with
- General improvements
- Small improvements to build cleanup targets.
3.2.11. v2.4.15¶
Released: January 21, 2016
- Docker container build target added for CentOS 7
- Thanks to Leo Comitale (Poldo) for writing a Makefile target and spec for creating a BEDOPS Docker container for CentOS 7.
- convert2bed
- Fixed buffer overflows in
convert2bedto improve conversion reliability for VCF files (thanks to Jared Andrews and Kousik Kundu for bug reports).
- Fixed buffer overflows in
- General improvements
- Improved OS X 10.11 build process.
3.2.12. v2.4.14¶
Released: April 21, 2015
- convert2bed
- Fixed missing
samtoolsvariable references in cluster conversion scripts (thanks to Brad Gulko for the bug report).
- Fixed missing
- General suite-wide improvements
- Fixed exception error message for
stdincheck (thanks to Brad Gulko for the bug report).
- Fixed exception error message for
3.2.13. v2.4.13¶
Released: April 20, 2015
- bedops
- Resolved issue in using
--ecwithbedopswhen reading fromstdin(thanks to Brad Gulko for the bug report).
- Resolved issue in using
- General suite-wide improvements
- Addressed inconsistency with constants defined for the suite at the extreme end of the limits we allow for coordinate values (thanks again to Brad Gulko for the report).
3.2.14. v2.4.12¶
Released: March 13, 2015
-
Checks have been added to determine if an integer argument is a file in the current working directory, before interpreting that argument as an overlap criterion for
-eand-noptions.To reduce ambiguity, if an integer is used as a file input,
bedopsissues a warning of the interpretation and provides guidance on how to force that value to instead be used as an overlap specification, if desired (thanks to E. Rynes for the pointer).
-
- Added support for
--prec/--sciwith--min-elementand--max-elementoperations (thanks to E. Rynes for the pointer).
- Added support for
bedops | bedmap | closest-features
- Added support for
bashprocess substitution/named pipes with specification of--chromand/or--ecoptions (thanks to B. Gulko for the bug report). - Fixed code that extracts
gzip-backed Starch archives frombedopsand other core tools (thanks again to B. Gulko for the bug report).
- Added support for
-
- Switched
matchesandqSizefields in order ofpsl2bedoutput. Refer to documentation for new field order. - Added null sentinel to GTF ID value.
- To help reduce the chance of buffer overflows, the
convert2bedtool increases the maximum field length from 8191 to 24575 characters to allow parsing of inputs with longer field length, such as very long attributes from mosquito GFF3 data (thanks to T. Karginov for the bug report).
- Switched
3.2.15. v2.4.11¶
Released: February 24, 2015
- convert2bed
- Fixed bug in
psl2bedwherematchescolumn value was truncated by one character. Updated unit tests. Thanks to M. Wirthlin for the bug report.
- Fixed bug in
3.2.16. v2.4.10¶
Released: February 23, 2015
- starch
- In addition to checking chromosome interleaving, the
starchtool now enforcessort-bedsort ordering on BED input and exits with anEINVALPOSIX error code if the data are not sorted correctly.
- In addition to checking chromosome interleaving, the
- convert2bed
- Added
--zero-indexedoption towig2bedandwig2starchwrappers andconvert2bedbinary, which converts WIG data that are zero-indexed without any coordinate adjustments. This is useful for WIG data sourced from the UCSC Kent toolbigWigToWig, where thebigWigdata can potentially be sourced from 0-indexed BAM- or bedGraph-formatted data. - If the WIG input contains any element with a start coordinate of 0, the default use of
wig2bed,wig2starchandconvert2bedwill exit early with an error condition, suggesting the use of--zero-indexed. - Updated copyright date range of wrapper scripts
- Added
3.2.17. v2.4.9¶
Released: February 17, 2015
- sort-bed
- Added support for
--check-sortto report if input is sorted (or not)
- Added support for
- Starch
- Improved support for
starch --header, where header contains tab-delimited fields
- Improved support for
- Starch C++ API
- Fixed bug with
starch --headerfunctionality, such that BEDOPS core tools (bedops, etc.) would be unable to extract correct data from headered Starch archive
- Fixed bug with
3.2.18. v2.4.8¶
Released: February 7, 2015
- Mac OS X packaging
- Installer signed with productsign to pass OS X Gatekeeper
- Linux packaging
- SHA1 hashes of each tarball are now part of the BEDOPS Releases description page, going forwards
- Updated copyright dates in source code
3.2.19. v2.4.7¶
Released: February 2, 2015
- convert2bed fixes and improvements
- Fixed
--splitsupport inpsl2bed(thanks to Marco A.) - Fixed compilation warning regarding comparison of signed and unsigned values
- Fixed corrupted
psl2bedtest inputs
- Fixed
3.2.20. v2.4.6¶
Released: January 30, 2015
- convert2bed fixes and improvements
- Added support for conversion of the GVF file format, including wrapper scripts and unit tests. Refer to the
gvf2beddocumentation for more information. - Fixed bug in string copy of zero-length element attribute for
gff2bedandgtf2bed(GFF and GTF) formats
- Added support for conversion of the GVF file format, including wrapper scripts and unit tests. Refer to the
- General fixes and improvements
- Fixed possibly corrupt bzip2, Jansson and zlib tarballs (thanks to rekado, Shane N. and Richard S.)
- Fixed typo in
bedextractdocumentation - Fixed broken image in Overview
- Removed 19 MB
_buildintermediate result directory (which should improve overallgit clonetime considerably!)
3.2.21. v2.4.5¶
Released: January 28, 2015
- convert2bed improvements
- Addition of RepeatMasker annotation output (
.out) file conversion support,rmsk2bedandrmsk2starchwrappers, and unit tests
- Addition of RepeatMasker annotation output (
3.2.22. v2.4.4¶
Released: January 25, 2015
- Documentation improvements
- Implemented substantial style changes via A Better Sphinx Theme and various customizations. We also include responsive web style elements to help improve browsing on mobile devices.
- Fixes to typos in conversion and other documents.
3.2.23. v2.4.3¶
Released: December 18, 2014
- Compilation improvements
- Shane Neph put in a great deal of work to enable parallel builds (e.g.,
make -j Nto build various targets in parallel). Depending on the end user’s environment, this can speed up compilation time by a factor of 2, 4 or more. - Fixed numerous compilation warnings of debug builds of
starchtoolkit under RHEL6/GCC and OS X 10.10.1/LLVM.
- Shane Neph put in a great deal of work to enable parallel builds (e.g.,
- New bedops features
- Added
--chopand--staggeroptions to “melt” inputs into contiguous or staggered disjoint regions of equivalent size. - For less confusion, arguments for
--element-of,--chopand otherbedopsoperations that take numerical modifiers no longer require a leading hyphen character. For instance,--element-of 1is now equivalent to the former usage of--element-of -1.
- Added
- New bedmap features
- The
--sweep-alloption reads through the entire map file without early termination and can help deal withSIGPIPEerrors. It adds to execution time, but the penalty is not as severe as with the use of--ec. Using--ecalone will enable error checking, but will now no longer read through the entire map file. The--ecoption can be used in conjunction with--sweep-all, with the associated time penalties. (Another method for dealing with issue this is to override howSIGPIPEerrors are caught by the interpreter (bash, Python, etc.) and retrapping them or ignoring them. However, it may not a good idea to do this as other situations may arise in production pipelines where it is ideal to trap and handle all I/O errors in a default manner.) - New
--echo-ref-sizeand--echo-ref-nameoperations report genomic length of reference element, and rename the reference element inchrom:start-end(useful for labeling rows for input formatrix2pngorRor other applications).
- The
- bedextract
- Fixed upper bound bug that would cause incorrect output in some cases
- conversion scripts
- Brand new C99 binary called
convert2bed, which wrapper scripts (bam2bed, etc.) now call. No more Python version dependencies, and the C-based rewrite offers massive performance improvements over old Python-based scripts. - Added
parallel_bam2starchscript, which parallelizes creation of Starch archive from very large BAM files in SGE environments. - Added bug fix for missing code in starchcluster.gnu_parallel script, where the final collation step was missing.
- The
vcf2bedscript now accepts the--do-not-splitoption, which prints one BED element for all alternate alleles.
- Brand new C99 binary called
- Starch archival format and compression/extraction tools
- Added duplicate- and nested-element flags in v2.1 of Starch metadata, which denote if a chromosome contains one or more duplicate and/or nested elements. BED files compressed with
starchv2.5 or greater, or Starch archives updated withstarchcatv2.5 or greater will include these values in the archive metadata. Theunstarchextraction tool offers--has-duplicateand--has-nestedoptions to retrieve these flag values for a specified chromosome (or for all chromosomes). - Added
--is-starchoption tounstarchto test if specified input file is a Starch v1 or v2 archive. - Added bug fix for compressing BED files with
starch, where the archive would not include the last element of the BED input, if the BED input lacked a trailing newline. The compression tools now include a routine for capturing the last line, if there is no newline.
- Added duplicate- and nested-element flags in v2.1 of Starch metadata, which denote if a chromosome contains one or more duplicate and/or nested elements. BED files compressed with
- Documentation improvements
- Remade some image assets throughout the documents to support Retina-grade displays
3.2.24. v2.4.2¶
Released: April 10, 2014
- conversion scripts
- Added support for
sort-bed --tmpdiroption to conversion scripts, to allow specification of alternative temporary directory for sorted results when used in conjunction with--max-memoption. - Added support for GFF3 files which include a FASTA directive in
gff2bedandgff2starch(thanks to Keith Hughitt). - Extended support for Python-based conversion scripts to support use with Python v2.6.2 and forwards, except for
sam2bedandsam2starch, which still require Python v2.7 or greater (and under Python3). - Fixed
--insertionsoption invcf2bedto now report a single-base BED element (thanks to Matt Maurano).
- Added support for
3.2.25. v2.4.1¶
Released: February 26, 2014
-
- Added
--fraction-bothand--exact(--fraction-both 1) to list of compatible overlap options with--faster. - Added 5% performance improvement with
bedmapoperations without--faster. - Fixed scenario that can yield incorrect results (cf. Issue 43).
- Added
-
- Added
--tmpdiroption to allow specification of an alternative temporary directory, when used in conjunction with--max-memoption. This is useful if the host operating system’s standard temporary directory (e.g.,/tmpon Linux or OS X) does not have sufficient space to hold intermediate results.
- Added
-
- Improvements to error handling in Python-based conversion scripts, in the case where no input is specified.
- Fixed typos in
gff2bedandpsl2beddocumentation (cf. commit a091e18).
OS X compilation improvements
We have completed changes to the OS X build process for the remaining half of the BEDOPS binaries, which now allows direct, full compilation with Clang/LLVM (part of the Apple Xcode distribution).
All OS X BEDOPS binaries now use Apple’s system-level C++ library, instead of GNU’s
libstdc++. It is no longer required (or recommended) to use GNUgccto compile BEDOPS on OS X.Compilation is faster and simpler, and we can reduce the size and complexity of Mac OS X builds and installer packages. By using Apple’s C++ library, we also eliminate the likelihood of missing library errors.
In the longer term, this gets us closer to moving BEDOPS to using the CMake build system, to further abstract and simplify the build process.
Cleaned up various compilation warnings found with
clang/clang++and GCC kits.
3.2.26. v2.4.0¶
Released: January 9, 2014
- bedmap
- Added new
--echo-map-sizeand--echo-overlap-sizeoptions to calculate sizes of mapped elements and overlaps between mapped and reference elements. - Improved performance for all
--echo-map-*operations. - Updated documentation.
- Added new
- Major enhancements and fixes to sort-bed:
- Improved performance.
- Fixed memory leak.
- Added support for millions of distinct chromosomes.
- Improved internal estimation of memory usage with
--max-memoption.
- Added support for compilation on Cygwin (64-bit). Refer to the installation documentation for build instructions.
- starchcat
- Fixed embarassing buffer overflow condition that caused segmentation faults on Ubuntu 13.
- All conversion scripts
- Python-based scripts no longer use temporary files, which reduces file I/O and improves performance. This change also reduces the need for large amounts of free space in a user’s
/tmpfolder, particularly relevant for users converting multi-GB BAM files. - We now test for ability to locate
starch,sort-bed,wig2bed_binandsamtoolsin user environment, quitting with the appropriate error state if the dependencies cannot be found. - Improved documentation. In particular, we have added descriptive tables to each script’s documentation page which describe how columns map from original data input to BED output.
- bam2bed and sam2bed
- Added
--custom-tags <value>command-line option to support a comma-separated list of custom tags (cf. Biostars discussion), i.e., tags which are not part of the original SAMtools specification. - Added
--keep-headeroption to preserve header and metadata as BED elements that use_headeras the chromosome name. This now makes these conversion scripts fully “non-lossy”.
- Added
- vcf2bed
- Added new
--snvs,--insertionsand--deletionsoptions that filter VCF variants into three separate subcategories. - Added
--keep-headeroption to preserve header and metadata as BED elements that use_headeras the chromosome name. This now makes these conversion scripts fully “non-lossy”.
- Added new
- gff2bed
- Added
--keep-headeroption to preserve header and metadata as BED elements that use_headeras the chromosome name. This now makes these conversion scripts fully “non-lossy”.
- Added
- psl2bed
- Added
--keep-headeroption to preserve header and metadata as BED elements that use_headeras the chromosome name. This now makes these conversion scripts fully “non-lossy”.
- Added
- wig2bed
- Added
--keep-headeroption towig2bedbinary andwig2bed/wig2starchwrapper scripts, to preserve header and metadata as BED elements that use_headeras the chromosome name. This now makes these conversion scripts fully “non-lossy”.
- Added
- Python-based scripts no longer use temporary files, which reduces file I/O and improves performance. This change also reduces the need for large amounts of free space in a user’s
- Added OS X uninstaller project to allow end user to more easily remove BEDOPS tools from this platform.
- Cleaned up various compilation warnings found with
clang/clang++and GCC kits.
3.2.27. v2.3.0¶
Released: October 2, 2013
Migration of BEDOPS code and documentation from Google Code to Github.
- Due to changes with Google Code hosting policies at the end of the year, we have decided to change our process for distributing code, packages and documentation. While most of the work is done, we appreciate feedback on any problems you may encounter. Please email us at bedops@stamlab.org with details.
- Migration to Github should facilitate requests for code by those who are familiar with
gitand want to fork our project to submit pull requests.
-
- General
--ecperformance improvements.
- General
-
- Adds support for the new
--skip-unmappedoption, which filters out reference elements which do not have mapped elements associated with them. See the end of the score operations section of the bedmap documentation for more detail. - General
--ecperformance improvements.
- Adds support for the new
-
Fixed bug with
starchwhere zero-byte BED input (i.e., an “empty set”) created a truncated and unusable archive. We now put in a “dummy” chromosome for zero-byte input, whichunstarchcan now unpack.This should simplify error handling with certain pipelines, specifically where set or other BEDOPS operations yield an “empty set” BED file that is subsequently compressed with
starch.
-
- Can now unpack zero-byte (“empty set”) compressed
starcharchive (see above). - Changed
unstarch --listoption to print tostdoutstream (this was previously sent tostderr).
- Can now unpack zero-byte (“empty set”) compressed
starch metadata library
- Fixed array overflow bug with BEDOPS tools that take starch archives as inputs, which affected use of archives as inputs to
closest-features,bedopsandbedmap.
- Fixed array overflow bug with BEDOPS tools that take starch archives as inputs, which affected use of archives as inputs to
-
- Python scripts require v2.7+ or greater.
- Improved (more “Pythonic”) error code handling.
- Disabled support for
--max-memsort parameter until sort-bed issue is resolved. Scripts will continue to sort, but they will be limited to available system memory. If you are processing files larger than system memory, please contact us at bedops@stamlab.org for details of a temporary workaround.
gff2bed conversion script
- Resolved
IndexErrorexceptions by fixing header support, bringing script in line with v1.21 GFF3 spec.
- Resolved
bam2bed and sam2bed conversion scripts
- Rewritten
bam2*andsam2*scripts frombashinto Python (v2.7+ support). - Improved BAM and SAM input validation against the v1.4 SAM spec.
- New
--splitoption prints reads withNCIGAR operations as separated BED elements. - New
--all-readsoption prints all reads, mapped and unmapped.
- Rewritten
-
- Fixed
stdinbug withbedextract.
- Fixed
New documentation via readthedocs.org.
- Documentation is now part of the BEDOPS distribution, instead of being a separate download.
- We use readthedocs.org to host indexed and searchable HTML.
- PDF and eBook documents are also available for download.
- Documentation is refreshed and simplified, with new installation and compilation guides.
OS X compilation improvements
We have made changes to the OS X build process for half of the BEDOPS binaries, which allows direct compilation with Clang/LLVM (part of the Apple Xcode distribution). Those binaries now use Apple’s system-level C++ library, instead of GNU’s
libstdc++.This change means that we require Mac OS X 10.7 (“Lion”) or greater—we do not support 10.6 at this time.
Compilation is faster and simpler, and we can reduce the size and complexity of Mac OS X builds and installer packages. By using Apple’s C++ library, we also reduce the likelihood of missing library errors. When this process is completed for the remaining binaries, it will no longer be necessary to install GCC 4.7+ (by way of MacPorts or other package managers) in order to build BEDOPS on OS X, nor will we have to bundle
libstdc++with the installer.
3.2.28. v2.2.0b¶
- Fixed bug with OS X installer’s post-installation scripts.
3.2.29. v2.2.0¶
Released: May 22, 2013
- Updated packages
- Precompiled packages are now available for Linux (32- and 64-bit) and Mac OS X 10.6-10.8 (32- and 64-bit) hosts.
- Starch v2 test suite
- We have added a test suite for the Starch archive toolkit with the source download. Test inputs include randomized BED data generated from chromosome and bounds data stored on UCSC servers as well as static FIMO search results. Tests put
starch,unstarchandstarchcatthrough various usage scenarios. Please refer to the Starch-specific Makefiles and the test target and subfolder’s README doc for more information.
- We have added a test suite for the Starch archive toolkit with the source download. Test inputs include randomized BED data generated from chromosome and bounds data stored on UCSC servers as well as static FIMO search results. Tests put
- starchcat
- Resolves bug with
--gzipoption, allowing updates ofgzip-backed v1.2 and v1.5 archives to the v2 Starch format (eitherbzip2- orgzip-backed).
- Resolves bug with
- unstarch
- Resolves bug with extraction of Starch archive made from BED files with four or more columns. A condition where the total length of additional columns exceeds a certain number of characters would result in extracted data in those columns being cut off. As an example, this could affect Starch archives made from the raw, uncut output of GTF- and GFF- conversion scripts.
- conversion scripts
- We have partially reverted
wig2bed, providing a Bash shell wrapper to the original C binary. This preserves consistency of command-line options across the conversion suite, while making use of the C binary to recover performance lost from the Python-based v2.1 revision ofwig2bed(which at this time is no longer supported). (Thanks to Matt Maurano for reporting this issue.)
- We have partially reverted
3.2.30. v2.1.1¶
Released: May 3, 2013
- bedmap
- Major performance improvements made in v2.1.1, such that current
bedmapnow operates as fast or faster than the v1.2.5 version ofbedmap!
- Major performance improvements made in v2.1.1, such that current
- bedops
- Resolves bug with
--partitionoption.
- Resolves bug with
- conversion scripts
- All v2.1.0 Python-based scripts now include fix for
SIGPIPEhandling, such that use ofheador other common UNIX utilities to process buffered standard output no longer yieldsIOErrorexceptions. (Thanks to Matt Maurano for reporting this bug.)
- All v2.1.0 Python-based scripts now include fix for
- 32-bit Linux binary support
- Pre-built Linux binaries are now available for end users with 32-bit workstations.
Other issues fixed:
- Jansson tarball no longer includes already-compiled libraries that could potentially interfere with 32-bit builds.
- Minor changes to conversion script test suite to exit with useful error code on successful completion of test.
3.2.31. v2.1.0¶
Released: April 22, 2013
- bedops
- New
--partitionoperator efficiently generates disjoint segments made from genomic boundaries of all overlapping inputs.
- New
- conversion scripts
- All scripts now use
sort-bedbehind the scenes to output sorted BED output, ready for use with BEDOPS utilities. It is no longer necessary to pipe data to or otherwise post-process converted data withsort-bed. - New
psl2bedconversion script, converting PSL-formatted UCSC BLAT output to BED. - New
wig2bedconversion script written in Python. - New
*2starchconversion scripts offered for all*2bedscripts, which output Starch v2 archives.
- All scripts now use
- closest-features
- Replaced
--shortestoption name with--closest, for clarity. (Old scripts which use--shortestwill continue to work with the deprecated option name for now. We advise editing pipelines, as needed.)
- Replaced
- starch
- Improved error checking for interleaved records. This also makes use of
*2starchconversion scripts with the--do-not-sortoption safer.
- Improved error checking for interleaved records. This also makes use of
- Improved Mac OS X support
- New Mac OS X package installer makes installation of BEDOPS binaries and scripts very easy for OS X 10.6 - 10.8 hosts.
- Installer resolves fatal library errors seen by some end users of older OS X BEDOPS releases.
3.2.32. v2.0.0b¶
Released: February 19, 2013
- Added
starchclusterscript variant which supports task distribution with GNU Parallel. - Fixed minor problem with
bam2bedandsam2bedconversion scripts.
3.2.33. v2.0.0a¶
Released: February 7, 2013
- bedmap
- Takes in Starch-formatted archives as input, as well as raw BED (i.e., it is no longer required to extract a Starch archive to an intermediate, temporary file or named pipe before applying operations).
- New
--chromoperator jumps to and operates on information for specified chromosome only. - New
--echo-map-id-uniqoperator lists unique IDs from overlapping mapping elements. - New
--max-elementand--min-elementoperators return the highest or lowest scoring overlapping map element.
- bedops
- Takes in Starch-formatted archives as input, as well as raw BED.
- New
--chromoperator jumps to and operates on information for specified chromosome only.
- closest-features
- Takes in Starch-formatted archives as input, as well as raw BED.
- New
--chromoperator jumps to and operates on information for specified chromosome only.
- sort-bed and
bbms- New
--max-memoption to limit system memory on large BED inputs. - Incorporated
bbmsfunctionality intosort-bedwith use of--max-memoperator.
- New
- starch, starchcat and unstarch
- New metadata enhancements to Starch-format archival and extraction, including:
--note,--elements,--bases,--bases-uniq,--list-chromosomes,--archive-timestamp,--archive-typeand--archive-version(see--helptostarch,starchcatandunstarchbinaries, or view the documentation for these applications for more detail). - Adds 20-35% performance boost to creating Starch archives with
starchutility. - New documentation with technical overview of the Starch format specification.
- New metadata enhancements to Starch-format archival and extraction, including:
- conversion scripts
- New
gtf2bedconversion script, converting GTF (v2.2) to BED.
- New
- Scripts are now part of main download; it is no longer necessary to download the BEDOPS companion separately.
3.2.34. v1.2.5b¶
Released: January 14, 2013
- Adds support for Apple 32- and 64-bit Intel hardware running OS X 10.5 through 10.8.
- Adds
READMEfor companion download. - Removes some obsolete code.
3.2.35. v1.2.5¶
Released: October 13, 2012
- Fixed unusual bug with
unstarch, where an extra (and incorrect) line of BED data can potentially be extracted from an archive. - Updated companion download with updated
bam2bedandsam2bedconversion scripts to address 0-indexing error with previous revisions.
3.2.36. v1.2.3¶
Released: August 17, 2012
- Added
--indicatoroption tobedmap. - Assorted changes to conversion scripts and associated companion download.