Clean export of the references in your Bibliography

 


Like many scientific writers, I use Mendeley to collect all my references in one single (and often messy) .bib file. For this reason, the necessity to clean and customize in a document-driven manner all the entries arises to improve the readability and the efficiency of your LaTex projects. I found a simple solution that might help many of you by following few steps. 

First of all, the following python program clean_bib.py pops out all the undesired fields from the bibtex entries. The full documentation of the program can be found at the link https://github.com/zohannn/clean_bib.git.

import datetime
import bibtexparser
from bibtexparser.bparser import BibTexParser
from bibtexparser.bwriter import BibTexWriter
from bibtexparser.customization import *

input_b = "library.bib"
output_b = "lib_unused_fields.bib"

now = datetime.datetime.now()
print("{0} Cleaning duff bib records from {1} into {2}".format(now, input_b, output_b))

# Let's define a function to customize our entries.
# It takes a record and return this record.
def customizations(record):
    """Use some functions delivered by the library
    :param record: a record
    :returns: -- customized record
    """
    record = type(record)
    record = page_double_hyphen(record)
    record = convert_to_unicode(record)
    ## delete the following keys.
    unwanted = ["archivePrefix","arxivId","doi", "url", "abstract", "file", "gobbledegook", "isbn", "link", "keyword","keywords", "number","mendeley-tags", "annote", "pmid", "chapter", "institution", "issn", "month"]
    ## unwanted = ["url", "abstract", "file", "gobbledegook", "isbn", "link", "keyword","keywords", "number","mendeley-tags", "annote", "pmid", "chapter", "institution", "issn", "month"]
    for val in unwanted:
        record.pop(val, None)
    return record


bib_database = None
with open(input_b) as bibtex_file:
    parser = BibTexParser()
    parser.customization = customizations
    parser.ignore_nonstandard_types = False
    bib_database = bibtexparser.load(bibtex_file, parser=parser)

if bib_database :
    now = datetime.datetime.now()
    success = "{0} Loaded {1} found {2} entries".format(now, input_b, len(bib_database.entries))
    print(success)
else :
    now = datetime.datetime.now()
    errs = "{0} Failed to read {1}".format(now, input_b)
    print(errs)
    sys.exit(errs)

bibtex_str = None
if bib_database:
    writer = BibTexWriter()
    writer.order_entries_by = ('author', 'year', 'type')
    bibtex_str = bibtexparser.dumps(bib_database, writer)
    #print(str(bibtex_str))
    with open(output_b, "w") as text_file:
        #print(bibtex_str, file=text_file) # it does not work
        print >> text_file , bibtex_str.encode('utf-8')

if bibtex_str:
    now = datetime.datetime.now()
    success = "{0} Wrote to {1} with len {2}".format(now, output_b, len(bibtex_str))
    print(success)
else:
    now = datetime.datetime.now()
    errs = "{0} Failed to write {1}".format(now, output_b)
    print(errs)
sys.exit(errs)

Assuming that your .bib file is named library.bib and it is located in the same folder of the script, it is possible to customize the unwanted list of fields according to your preferences that are required in your LaTex project. I personally run the clean_bib.py by the following Bash script that also allows me to set the German-style of the entries.

#!/bin/bash
python clean_bib.py
bibclean -German-style lib_unused_fields.bib > library_clean.bib

Finally, most likely your LaTex project does not cite all the references that are in your library library_clean.bib. Therefore, all the non-cited entries must be removed to create a custom bibliography. Assuming that your main LaTex file is named main.tex and that your library is located under the bib/ folder, the following scripts only exports the cited references of your LaTex project and place them in the file bib/exported_refs.bib.

#!/bin/bash
file="bib/exported_refs.bib" if [ -f $file ] ; then rm $file fi bibexport -o bib/exported_refs.bib main.aux

I hope that this post has been useful for many of you who like me have struggled with formatting the bibliography of different LaTex projects. Please feel free to comment and share your own solution. I would be happy to learn new and more efficient techniques.

Comments

Popular Posts