Re: Extracción de metadatos de documentos
Medardo Rodriguez (Merchise Group)
med.swl en gmail.com
Vie Sep 12 16:06:01 CEST 2008
2008/9/12 Chema Cortes <pych3m4 en gmail.com>:
> ¿Conocéis de alguna librería que use varios formatos de ficheros? No
> necesito que sea multiplataforma.
== Package: python-hachoir-metadata ==
Description: Program to extract metadata using Hachoir library
hachoir-metadata extracts metadata from multimedia files: music,
picture, video, but also archives. It supports most common file
formats:
* Archives: bzip2, gzip, zip, tar
* Audio: MPEG audio ("MP3"), WAV, Sun/NeXT audio, Ogg/Vorbis (OGG),
MIDI, AIFF, AIFC, Real audio (RA)
* Image: BMP, CUR, EMF, ICO, GIF, JPEG, PCX, PNG, TGA, TIFF, WMF, XCF
* Video: ASF format (WMV video), AVI, Matroska (MKV), Quicktime
(MOV), Ogg/Theora, Real media (RM)
It tries to give the more informations as possible. For some file
formats, it gives really more informations then libextractor for
example. RIFF parser is
really good for example, it can extract creation date, software used
to generate the file, etc. But hachoir-metadata can not guess
informations. The most
complex operation is just to compute duration of a music using frame
size and file size.
hachoir-metadata has three modes:
* classic mode: extract metadata, you can use --level=LEVEL to limit
quantity of information to display (and not to extract)
* --type: show on one line the file format and most important informations
* --mime: just display file MIME type
The command 'hachoir-metadata --mime' works like 'file --mime', and
'hachoir-metadata --type' like 'file'. But today file command supports
more file formats
then hachoir-metadata.
Homepage: http://hachoir.org/wiki/hachoir-metadata
== Package: python-kaa-metadata ==
Description: Media Metadata for Python
Kaa Metadata is a Media Meta Data retrieval framework. It retrieves
metadata from mp3, ogg, avi, jpg, tiff and other file formats. Among
others it thereby
parses ID3v2, ID3v1, EXIF, IPTC and Vorbis data into an object
oriented struture.
The Kaa Media Repository is a set of Python modules related to media.
Homepage: http://freevo.org/kaa
== Package: python-pypdf ==
Description: PDF toolkit implemented solely in Python
A PDF toolkit implemented solely in Python. It is capable of:
* extracting document information (title, author, ...),
* splitting documents page by page,
* merging documents page by page,
* cropping pages,
* merging multiple pages into a single page,
* encrypting and decrypting PDF files.
By being Pure-Python, it should run on any Python platform without
any dependencies on external libraries. It can also work entirely on
StringIO objects
rather than file streams, allowing for PDF manipulation in memory. It
is therefore a useful tool for websites that manage or manipulate
PDFs.
Homepage: http://pybrary.net/pyPdf/
== Package: python-uno ==
Description: Python interface for OpenOffice.org
The Python-UNO bridge allows use of the standard OpenOffice.org API
with the Python scripting language. It additionally allows others to
develop UNO
components in Python, thus Python UNO components may be run within
the OpenOffice.org process and can be called from C++ or the built in
StarBasic scripting
language.
You can more information about Pyuno at
http://udk.openoffice.org/python/python-bridge.html
Homepage: http://udk.openoffice.org/python/python-bridge.html
Tags: devel::lang:c++, devel::lang:python, implemented-in::python,
role::shared-lib, suite::openoffice
Saludos
_______________________________________________
Lista de correo Python-es
http://listas.aditel.org/listinfo/python-es
FAQ: http://listas.aditel.org/faqpyes
Más información sobre la lista de distribución Python-es