Re: Extracción de metadatos de documentos

Medardo Rodriguez (Merchise Group) med.swl en gmail.com
Vie Sep 12 16:06:01 CEST 2008


2008/9/12 Chema Cortes <pych3m4 en gmail.com>:
> ¿Conocéis de alguna librería que use varios formatos de ficheros? No
> necesito que sea multiplataforma.

== Package: python-hachoir-metadata ==
Description: Program to extract metadata using Hachoir library
 hachoir-metadata extracts metadata from multimedia files: music,
picture, video, but also archives. It supports most common file
formats:
 * Archives: bzip2, gzip, zip, tar
 * Audio: MPEG audio ("MP3"), WAV, Sun/NeXT audio, Ogg/Vorbis (OGG),
MIDI, AIFF, AIFC, Real audio (RA)
 * Image: BMP, CUR, EMF, ICO, GIF, JPEG, PCX, PNG, TGA, TIFF, WMF, XCF
 * Video: ASF format (WMV video), AVI, Matroska (MKV), Quicktime
(MOV), Ogg/Theora, Real media (RM)

 It tries to give the more informations as possible. For some file
formats, it gives really more informations then libextractor for
example. RIFF parser is
 really good for example, it can extract creation date, software used
to generate the file, etc. But hachoir-metadata can not guess
informations. The most
 complex operation is just to compute duration of a music using frame
size and file size.

 hachoir-metadata has three modes:
 * classic mode: extract metadata, you can use --level=LEVEL to limit
quantity of information to display (and not to extract)
 * --type: show on one line the file format and most important informations
 * --mime: just display file MIME type

 The command 'hachoir-metadata --mime' works like 'file --mime', and
'hachoir-metadata --type' like 'file'. But today file command supports
more file formats
 then hachoir-metadata.
Homepage: http://hachoir.org/wiki/hachoir-metadata


== Package: python-kaa-metadata ==
Description: Media Metadata for Python
 Kaa Metadata is a Media Meta Data retrieval framework. It retrieves
metadata from mp3, ogg, avi, jpg, tiff and other file formats. Among
others it thereby
 parses ID3v2, ID3v1, EXIF, IPTC and Vorbis data into an object
oriented struture.

 The Kaa Media Repository is a set of Python modules related to media.
Homepage: http://freevo.org/kaa


== Package: python-pypdf ==
Description: PDF toolkit implemented solely in Python
 A PDF toolkit implemented solely in Python.  It is capable of:
 * extracting document information (title, author, ...),
 * splitting documents page by page,
 * merging documents page by page,
 * cropping pages,
 * merging multiple pages into a single page,
 * encrypting and decrypting PDF files.
 By being Pure-Python, it should run on any Python platform without
any dependencies on external libraries. It can also work entirely on
StringIO objects
 rather than file streams, allowing for PDF manipulation in memory. It
is therefore a useful tool for websites that manage or manipulate
PDFs.
Homepage: http://pybrary.net/pyPdf/


== Package: python-uno ==
Description: Python interface for OpenOffice.org
 The Python-UNO bridge allows use of the standard OpenOffice.org API
with the Python scripting language. It additionally allows others to
develop UNO
 components in Python, thus Python UNO components may be run within
the OpenOffice.org process and can be called from C++ or the built in
StarBasic scripting
 language.

 You can more information about Pyuno at
http://udk.openoffice.org/python/python-bridge.html
Homepage: http://udk.openoffice.org/python/python-bridge.html

Tags: devel::lang:c++, devel::lang:python, implemented-in::python,
role::shared-lib, suite::openoffice


Saludos
_______________________________________________
Lista de correo Python-es 
http://listas.aditel.org/listinfo/python-es
FAQ: http://listas.aditel.org/faqpyes





Más información sobre la lista de distribución Python-es