minifold package#
Submodules#
minifold.binary_predicate module#
- class BinaryPredicate(left, operator, right)[source]#
Bases:
object
The
BinaryPredicate
represents a binary predicate to define minifold queries. See also theQuery
class (especially, thefilters
attribute).Constructor.
Example
>>> from minifold import BinaryPredicate >>> bp = BinaryPredicate("a", "<=", 0) >>> entry = {"a": 1, "b": 2} >>> bp(entry) False >>> entry = {"a": -1, "b": 2} >>> bp(entry) True
- Parameters:
left (str) – The left operand. Maybe the key of an entry.
operator (str) – The binary operator, see also
OPERATORS
.right (str) – The right operand. Cannot be the key of an entry.
- property left: str#
Retrieves the left operand of this
BinaryPredicate
.- Returns:
The left operand of this
BinaryPredicate
.
- match(entry: dict) bool [source]#
Matches an entry against this
BinaryPredicate
.- Parameters:
entry (dict) – An entry. One of its key must corresponds to
self.left
.- Raises:
KeyError – if the entry does not have key matching
self.left
.- Returns:
True
if thisBinaryPredicate
is satisfied byentry
,False
otherwise.
- property operator: str#
Retrieves the binary operator of this
BinaryPredicate
.- Returns:
The binary operator of this
BinaryPredicate
.
- property right#
Retrieves the right operand of this
BinaryPredicate
.- Returns:
The right operand of this
BinaryPredicate
.
minifold.cache module#
- class CacheConnector(child)[source]#
Bases:
Connector
CacheConnector
is aConnector
is an abstract class used to cache results in the middle of a minifold query plan. If a query reaches thisCacheEntriesConnector
and is in cache, it is not forwarded to the underlying connector, hence accelerating the query plan execution.See specializations:
StorageCacheConnector
(caching using a local file)JsonCacheConnector
(caching using a local JSON file)PickleCacheConnector
(caching using a local pickle file)
Possible improvements:
For the moment the cache is class-name based. It should rather be identify by the connector setup and the underlying connectors (if any)
For the moment, the cache is only used if the exact same Query was issued over the past. But we should be able to reuse the cache if a less strict query has been issued, and use Connector.reshape_entries afterwards.
Constructor.
- attributes(object: str) set [source]#
Retrieves the keys supported by this
CacheEntriesConnector
instance.- Returns:
The key of the underlying entries.
- callback_read(query: Query) object [source]#
Callback triggered when data must be fetched in this
CacheEntriesConnector
.- Parameters:
query (Query) – The handled
Query
instance.- Raises:
RuntimeError – if not overloaded.
- Returns:
The fetched data.
- callback_write(query: Query, data: object)[source]#
Callback triggered when data must be saved in this
CacheEntriesConnector
instance.- Parameters:
query (Query) – The handled
Query
instance.data (object) – The data to be saved.
- Raises:
RuntimeError – if not overloaded.
- clear_cache()[source]#
Clears this
CacheEntriesConnector
instance entirely. This method should be overloaded.
- clear_query(query: Query)[source]#
Removes a
Query
result from thisCacheEntriesConnector
instance. This method should be overloaded.- Parameters:
query (Query) – The handled
Query
instance.
- is_cachable(query: Query, data: object) bool [source]#
Checks whether a data may be saved in this
CacheEntriesConnector
instance.- Parameters:
query (Query) – The handled
Query
instance.data (object) – The data fetched by this
Query
that must be saved to thisCacheEntriesConnector
instance.
- Returns:
True
ifquery
may be cached in thisCacheEntriesConnector
instance,False
otherwise.
- is_cached(query: Query) bool [source]#
Checks whether a
Query
instance is already cached in thisCacheEntriesConnector
instance.- Parameters:
query (Query) – The handled
Query
instance.- Returns:
True
ifquery
is cached in thisCacheEntriesConnector
instance,False
otherwise.
- query(query: Query) list [source]#
Handles an incoming
Query
instance.The
CacheEntriesConnector
checks whether it is already cached. Ifquery
is cached in thisCacheEntriesConnector
, it is not forwarded toself.child
and the results are directly from the cache. Otherwise, it is forwarded toself.child
.- Parameters:
query (Query) – The handled
Query
instance.- Returns:
The corresponding entries.
- read(query: Query) object [source]#
Fetches from this
CacheEntriesConnector
instance the corresponding data.- Returns:
The corresponding cached object.
- write(query: Query, data: object) bool [source]#
Writes data to this
CacheEntriesConnector
instance.- Parameters:
query (Query) – The handled
Query
instance.data (object) – The data fetched by this
Query
that must be saved to thisCacheEntriesConnector
instance.
- Returns:
The corresponding cached object.
- class JsonCacheConnector(child: Connector, lifetime: timedelta = datetime.timedelta(days=3), cache_dir: str = None)[source]#
Bases:
StorageCacheConnector
JsonCacheConnector
overloadsStorageCacheConnector
to cache result in JSON files.Constructor.
- Parameters:
child (Connector) – The child
Connector
instance.lifetime (datetime.timedelta) – The lifetime of the cached objects. Pass
None
to use the default lifetime.cache_dir (str) – The path to the cache directory. Pass
None
to use the default minifold cache directory.
- class PickleCacheConnector(child: Connector, lifetime: timedelta = datetime.timedelta(days=3), cache_dir: str = None)[source]#
Bases:
StorageCacheConnector
PickleCacheConnector
overloadsStorageCacheConnector
to cache result in pickle files.Constructor.
- Parameters:
child (Connector) – The child
Connector
instance.lifetime (datetime.timedelta) – The lifetime of the cached objects. Pass
None
to use the default lifetime.cache_dir (str) – The path to the cache directory. Pass
None
to use the default minifold cache directory.
- class StorageCacheConnector(child: Connector, callback_load: callable = None, callback_dump: callable = None, lifetime: timedelta = None, cache_dir: str = None, read_mode: str = 'r', write_mode: str = 'w', extension: str = '')[source]#
Bases:
CacheConnector
StorageCacheConnector
is an abstract class that specializes ofCacheConnector
to manage file on the local storage.See specializations:
JsonCacheConnector
(caching using a local JSON file)PickleCacheConnector
(caching using a local pickle file)
Constructor.
- Parameters:
child (Connector) – The child
Connector
instance.callback_load (callable) – A function
callback_load(f)
wheref
is the read file descriptor of the cache and which returns the cached object.callback_dump (callable) – A function
callback_load(data, f)
wheref
is the read file descriptor of the cache anddata
is the object to be cached.lifetime (datetime.timedelta) – The lifetime of the cached objects. Pass
None
to use the default lifetime.cache_dir (str) – The path to the cache directory. Pass
None
to use the default minifold cache directory.read_mode (str) – The string specifying how to open the file descriptor to read the cache. Possible values are
"r"
(text cache) and"rb"
(binary cache).read_mode – The string specifying how to open the file descriptor to read the cache. Possible values are
"r"
(text cache) and"rb"
(binary cache).extension (str) – The extension of the cache filename.
- base_dir = '/home/docs/.minifold/cache'#
- callback_read(query: Query) object [source]#
Callback triggered when data must be fetched in this
StorageCacheConnector
.- Parameters:
query (Query) – The handled
Query
instance.- Returns:
The fetched data.
- callback_write(query: Query, data: object)[source]#
Callback triggered when data must be saved in this
StorageCacheConnector
instance.- Parameters:
query (Query) – The handled
Query
instance.data (object) – The data to be saved.
- clear_query(query: Query)[source]#
Removes a
Query
result from thisStorageCacheConnector
instance.- Parameters:
query (Query) – The handled
Query
instance.
- is_cached(query: Query) bool [source]#
Checks whether a
Query
instance is already cached in thisStorageCacheConnector
instance.- Parameters:
query (Query) – The handled
Query
instance.- Returns:
True
ifquery
is cached in thisStorageCacheConnector
instance,False
otherwise.
- static is_fresh_cache(cache_filename: str, lifetime: timedelta) bool [source]#
Checks whether a minifold cache file is fresh enough to be relevant.
To do so,
StorageCacheConnector.is_fresh_cache
fetches the date of the cache file from the filesystem and matches it againstlifetime
.- Parameters:
cache_filename (str) – The path to the cache.
lifetime (datetime.timedelta) – The lifetime of the cache.
- Returns:
True
if and only if the cache is fresh enough,False
otherwise.
- lifetime = datetime.timedelta(days=3)#
- make_cache_dir(base_dir: str, sub_dir: str = '')[source]#
Crafts the path to a minifold cache file.
- Parameters:
base_dir (str) – The directory storing the minifold caches. See also
DEFAULT_CACHE_STORAGE_BASE_DIR
.sub_dir (str) – The directory storing the minifold caches. This is often the name assigned to a connector pulling remote data.
minifold.cached module#
- class CachedEntriesConnector(load_entries: callable, cache_filename: str, load_cache: callable, save_cache: callable, read_mode: str, write_mode: str, with_cache: bool = True)[source]#
Bases:
EntriesConnector
CachedEntriesConnector
is aConnector
is an abstract class used to fetch data from a cache saved on the local storage.See specializations:
JsonCachedConnector
(caching using a JSON file)PickleCachedConnector
(caching using pickles)
Constructor.
- Parameters:
load_entries (callable) – A function called to populate this
CachedEntriesConnector
.cache_filename (str) – The path to the file used to save the cache on the local storage.
load_cache (callable) – A function entries = load_cache(f) where: entries is a list of dictionnaries; f is the (read) file descriptor of the cache.
save_cache (callable) – A function save_cache(cache_filename, f) where: entries is a list of dictionnaries; f is the (write) file descriptor of the cache.
read_mode (str) – A string specifying how the read file decriptor of the cache must be created. Possible values are
"r"
(text-based cache) and"rb"
(binary cache).write_mode (str) – A string specifying how the write file decriptor of the cache must be created. Possible values are
"w"
(text-based cache) and"wb"
(binary cache).
- class JsonCachedConnector(load_entries: callable, cache_filename: str, **kwargs)[source]#
Bases:
CachedEntriesConnector
The
JsonCachedConnector
class implements theCachedEntriesConnector
using a tierce JSON file.Constructor. See also
CachedEntriesConnector.__init__
- Parameters:
load_entries (callable) – A function called to populate this
CachedEntriesConnector
.cache_filename (str) – The path to the file used to save the cache on the local storage.
- class PickleCachedConnector(load_entries: callable, cache_filename: str, **kwargs)[source]#
Bases:
CachedEntriesConnector
The
PickleCachedConnector
class implements theCachedEntriesConnector
using a tierce pickle file.Constructor. See also
CachedEntriesConnector.__init__
- Parameters:
load_entries (callable) – A function called to populate this
CachedEntriesConnector
.cache_filename (str) – The path to the file used to save the cache on the local storage.
minifold.closure module#
- closure(key: object, fds: dict) set [source]#
Compute the closure of a key given a set of functional dependencies.
- Parameters:
key (object) – A key. Possible types are
str
orset
,frozenset
orlist
of strings.fds (dict) – A dictionary where each key-value pair represents a functional dependency (by mapping a key with another key).
- Returns:
The “reachable” attributes from
key
.
minifold.config module#
- class Config(*args, **kwargs)[source]#
Bases:
dict
Config
is used to centralize the minifold configuration, possibly fetched from multiple files.Note this class inherits
Singleton
, meaning that any instance of this class corresponds to the same instance.Each piece of configuration must be a dictionary mapping abitrary, where keys are (minifold gateway) arbitary names. Each gateway is mapped with a dictionnary which maps
"key"
with the appropriateConnector
type and"args"
with the parameters to be passed to the corresponding constructor.These configuration pieces are typically stored in configuration file that resides in
~/.minifold/conf
. By convention, each JSON file stored in this directory is named “gw_type:gw_name.json” where gw_type helps to understand the underlying minifold connector and gw_name identifies the nature of the data source.See
DEFAULT_MINIFOLD_CONFIG
.Example
>>> from minifold.dblp import DblpConnector >>> config = Config() >>> config.loads(DEFAULT_MINIFOLD_CONFIG) >>> dblp1 = config.make_connector("dblp:dagstuhl") >>> dblp2 = config.make_connector("dblp:uni-trier")
- load(stream)[source]#
Populates the
Config
instance from an input stream (e.g., a read file descriptor) storing JSON data.- Parameters:
stream – The input JSON stream.
- load_file(filename: str)[source]#
Populates the
Config
instance from an JSON file.- Parameters:
filename (str) – The path to the input JSON file. Example:
~/.minifold/conf/gw_type:gw_name.json
.
minifold.connector module#
- class Connector[source]#
Bases:
object
The
Connector
class is the base class of most of classes involved in minifold.A minifold query plan is a hierarchy of
Connector
instances. This hierarchy forms a pipeline, where leaves corresponds to gateways to some data sources and internal nodes to SQL-like operators or intermediate caches.Running a query consists in sending a
Query
instance to the rootConnector
instance of the query plan. Then, eachConnector
(possibly alters) and forwards the query to its children. The leaves of the query plan areConnector
are gateways allowing to fetch data from a remote or local data source. A read query returns the entries matched by the input minifold query (if any) to the parent node. Iteratively, the entries reach the root node of the query plan.As a result, once the hierarchy of the query plan is ready, the only relevant entry point is the root
Connector
instance.Constructor.
- answer(query: Query, ret: list)[source]#
Method traversed when this
Connector
is ready to answer to a givenQuery
.This method is used in the child classes to trace entries in the (complex) query plans.
- Parameters:
query (Query) – The related
Query
instance.ret (list) – The corresponding results.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
Connector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- query(query: Query) list [source]#
Handles an input
Query
instance.- Parameters:
query (Query) – The handled query.
- Returns:
The list of entries matching the input query.
- reshape_entries(query: Query, entries: list) list [source]#
Reshapes entries returned by
self.query()
before callingself.answer()
.This method should only be called if the Connector only support a subset of query operators among
{SELECT, WHERE, LIMIT, OFFSET}
in SQL.- Parameters:
query (Query) – The handled
Query
instance.entries (list) – The list of raw entries fetched so far, corresponding to
SELECT * FROM foo LIMIT n WHERE n >= query.limit
.
- Returns:
The reshaped entries.
- subclasses = {'minifold.cache.CacheConnector': <class 'minifold.cache.CacheConnector'>, 'minifold.cache.JsonCacheConnector': <class 'minifold.cache.JsonCacheConnector'>, 'minifold.cache.PickleCacheConnector': <class 'minifold.cache.PickleCacheConnector'>, 'minifold.cache.StorageCacheConnector': <class 'minifold.cache.StorageCacheConnector'>, 'minifold.cached.CachedEntriesConnector': <class 'minifold.cached.CachedEntriesConnector'>, 'minifold.cached.JsonCachedConnector': <class 'minifold.cached.JsonCachedConnector'>, 'minifold.cached.PickleCachedConnector': <class 'minifold.cached.PickleCachedConnector'>, 'minifold.count.CountConnector': <class 'minifold.count.CountConnector'>, 'minifold.csv.CsvConnector': <class 'minifold.csv.CsvConnector'>, 'minifold.dblp.DblpConnector': <class 'minifold.dblp.DblpConnector'>, 'minifold.download.DownloadConnector': <class 'minifold.download.DownloadConnector'>, 'minifold.entries_connector.EntriesConnector': <class 'minifold.entries_connector.EntriesConnector'>, 'minifold.google_scholar.GoogleScholarConnector': <class 'minifold.google_scholar.GoogleScholarConnector'>, 'minifold.group_by.GroupByConnector': <class 'minifold.group_by.GroupByConnector'>, 'minifold.hal.HalConnector': <class 'minifold.hal.HalConnector'>, 'minifold.html_table.HtmlTableConnector': <class 'minifold.html_table.HtmlTableConnector'>, 'minifold.join_if.JoinIfConnector': <class 'minifold.join_if.JoinIfConnector'>, 'minifold.json.JsonConnector': <class 'minifold.json.JsonConnector'>, 'minifold.json.JsonFileConnector': <class 'minifold.json.JsonFileConnector'>, 'minifold.lambdas.LambdasConnector': <class 'minifold.lambdas.LambdasConnector'>, 'minifold.ldap.LdapConnector': <class 'minifold.ldap.LdapConnector'>, 'minifold.limit.LimitConnector': <class 'minifold.limit.LimitConnector'>, 'minifold.mongo.MongoConnector': <class 'minifold.mongo.MongoConnector'>, 'minifold.natural_join.NaturalJoinConnector': <class 'minifold.natural_join.NaturalJoinConnector'>, 'minifold.rename.RenameConnector': <class 'minifold.rename.RenameConnector'>, 'minifold.select.SelectConnector': <class 'minifold.select.SelectConnector'>, 'minifold.sort_by.SortByConnector': <class 'minifold.sort_by.SortByConnector'>, 'minifold.twitter.TwitterConnector': <class 'minifold.twitter.TwitterConnector'>, 'minifold.union.UnionConnector': <class 'minifold.union.UnionConnector'>, 'minifold.unique.UniqueConnector': <class 'minifold.unique.UniqueConnector'>, 'minifold.unnest.UnnestConnector': <class 'minifold.unnest.UnnestConnector'>, 'minifold.where.WhereConnector': <class 'minifold.where.WhereConnector'>}#
- trace_entries = False#
- trace_only_keys = False#
- trace_queries = False#
minifold.connector_util module#
This file gathers some useful function to get and display
some entries from an abitrary Connector
instance.
- get_values(connector: Connector, attribute: str) list [source]#
Retrieves distinct values mapped with a given attribute stored in the entries of a given
Connector
instance.- Parameters:
connector (Connector) – The queried
Connector
instance.attribute (str) – The queried attribute.
- Raises:
KeyError – if
attribute
is not a valid attribute.- Returns:
The corresponding values.
minifold.count module#
- class CountConnector(child: Connector)[source]#
Bases:
Connector
The
CountConnector
class is used to implement the COUNT statement in a minifold query plan. As it is one of the rare connector returning an integer (instead of a list of entries) this is often the root connector in the tree modeling the minifold query plan.Constructor.
- Parameters:
child (Connector) – The child
Connector
instance.
minifold.country module#
This file provides utilities to wrap the pycountry
python module.
minifold.csv module#
- class CsvConnector(data: str, delimiter: chr = ' ', quotechar: chr = '"', mode: CsvModeEnum = CsvModeEnum.FILENAME)[source]#
Bases:
Connector
The
CsvConnector
is a minifold gateway allowing to manipulate data stored in CSV file.Constructor.
- Parameters:
data (str) – Depending on the nature of the CSV data source, this string either contains the path to the CSV file; or the CSV data itself; or the
TextIOBase
instance. See themode
parameter.delimiter (str) – The string that delimits each column of the input CSV. Example:
";"
,"|"
," "
. Defaults to' '
.quotechar (str) – The charater used to delimits values that may contain
delimiter
. Defaults to'"'
.mode (CsvModeEnum) – The nature of the input CSV source. See also the
CsvModeEnum
enumeration.
- attributes(object: str)[source]#
Lists the attributes of the collection of objects stored in this
CsvConnector
instance.- Parameters:
object (str) – The name of the minifold object. As a
CsvConnector
instance stores a single collection,object
is no relevant and you may passNone
.- Returns:
The set of available
object
’s attributes
minifold.dblp module#
The DBLP Computer science bibliography provides open bibliographic information on major computer science journals and proceedings.
Some query examples:
Retrieves articles authored by Fabien Mathieu, in JSON format <https://dblp.dagstuhl.de/search/publ/api?q=fabien-mathieu&h=500&format=json>
Retrieves articles authored by Fabien Mathieu pubished in 2014, in JSON format <https://dblp.dagstuhl.de/search/publ/api?q=fabien-mathieu%20year:2014&h=500&format=json>
By default DBLP only returns up to 30 records (see this link).
The default limit in DblpConnector
is set to 9999
.
It is possible to query a specific researcher using its DBLP-ID. The ID can be found by browsing the page related to a researcher.
Example: - Query using the DBLP name <https://dblp.uni-trier.de/pers/hd/c/Chen:Chung_Shue> - Query using the DBLP ID <https://dblp.org/pid/30/1446> - The DBLP ID can be obtained by clicking on the export bibliography icon.
For the moment, the only result format supported by DBLP is XML (see this link).
- class DblpConnector(map_dblp_id: dict = None, map_dblp_name: dict = None, dblp_api_url: str = 'https://dblp.dagstuhl.de')[source]#
Bases:
Connector
The
DblpConnector
class is a minifold gateway allowing to fetch data from DBLP (repository of scientific articles).See also: -
HalConnector
. -GoogleScholarConnector
.Constructor.
- Parameters:
map_dblp_id (dict) – Maps an author full name with his/her DBLP ID.
map_dblp_name (dict) – Maps an author obsolete full name with his/her current full name.
dblp_api_url (str) – The URL of the DBLP server. Defaults to
DBLP_API_URL
.
- property api_url: str#
Retrieves the URL of the remote DBLP server. managed by this
DblpConnector
instance.- Returns:
The URL of the remote DBLP server.
- attributes(object: str) set [source]#
Lists the attributes of the collection of objects stored in this
DblpConnector
instance.- Parameters:
object (str) – The name of the minifold object. As a
DblpConnector
instance stores a single collection,object
is no relevant and you may passNone
.- Returns:
The set of available
object
’s attributes
- binary_predicate_to_dblp(p: BinaryPredicate, result: dict)[source]#
Converts a minifold predicate to a DBLP predicate.
- Parameters:
p (BinaryPredicate) – A
BinaryPredicate
instance.result (dict) – The output dictionary.
- extract_entries(query: Query, results: list) list [source]#
Extracts the minifold entries for a DBLP query result.
- Parameters:
query (Query) – A
Query
instance.results – The DBLP results.
- Returns:
A
list
ofdict
instances (the minifold entres).
- property format: str#
Retrieves the format of data retrieved from the DBLP server.
- Returns:
A value in
{"xml", "json", "jsonp"}
.
- get_dblp_id(s: str) str [source]#
Retrieves the current DBLP ID of an author.
- Parameters:
s (str) – The input fullname.
- Returns:
The corresponding DBLP ID or
s
.
- get_dblp_name(s: str) str [source]#
Retrieves the current DBLP fullname of an author.
- Parameters:
s (str) – The input fullname.
- Returns:
The (possibly updated) fullname.
- property map_dblp_id: dict#
Retrieves the dictionary that maps an author full name with his/her DBLP ID.
- Returns:
The queried dictionary.
- property map_dblp_name: dict#
Retrieves the dictionary that maps an author obsolete full name with his/her current full name.
- Returns:
The queried dictionary.
- query(query: Query) list [source]#
Handles an input
Query
instance.- Parameters:
query (Query) – The handled query.
- Returns:
The list of entries matching
query
.
- reshape_entries(query: Query, entries: list) list [source]#
Apply
DblpConnector.reshape_entry()
to a list of entries.- Parameters:
query (Query) – A
Query
instance.entries (list) – A list of minifold entries.
- Returns:
The reshaped entries.
- reshape_entry(query: Query, entry: dict) dict [source]#
Reshape a DBLP entry to make it compliant with a given
Query
instance.- Parameters:
query (Query) – A
Query
instance.entry (dict) – The entry to reshape.
- Returns:
The reshaped entry.
minifold.dict_util module#
minifold.doc_type module#
- class DocType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
IntEnum
The
DocType
class enumerates the different kind of scientific publication.See also:
HalConnector
DblpConnector
- ARTICLE = 1#
- BOOKS_AND_THESES = 0#
- CHAPTER = 7#
- COMMUNICATION = 2#
- HDR = 6#
- JOURNAL = 3#
- PATENT = 8#
- POSTER = 5#
- REPORT = 4#
- UNKNOWN = 9999#
minifold.download module#
- class DownloadConnector(map_url_out: dict, child: ~minifold.connector.Connector, downloads: callable = <function downloads>, extract_response: callable = <function extract_response>)[source]#
Bases:
Connector
The
DownloadConnector
allows to fetch data over HTTP.Constructor.
- Parameters:
map_url_out (dict) – A
dict(str: str)
entry attribute containing an URL to another entry attribute that will store the contents provided by this URL. Example:{"url": "html_content"}
child (Connector) – The child Connector.
downloads (callable) –
Callback(urls) -> dict(url: content)
where urls is an iterable of URLs and where. the returned dict maps each urls and the corresponding response. Note: You could passpartial(download, ...)
to customize the timeouts.extract_response (callable) –
Callback(response) -> str
callback used to extract data from an HTTP query.
- attributes(object: str) set [source]#
Lists available attributes related to a given collection of object stored in this
DownloadConnector
instance.- Parameters:
object (str) – The name of the collection of entries.
- Returns:
The set of available attributes for
object
.
- download(url: str, timeout: tuple = (1.0, 2.0), cache_filename: str = None)[source]#
Downloads the content related to a given URL. See also
request_cache()
to enable caching.- Parameters:
url (str) – A string containing the target URL.
timeout (tuple) – A
(float, float)
tuple corresponding to the (connect timeout, read timeout).cache_filename (str) – A string containing path to the cache to use. Pass
None
to use the default cache.
- Raises:
requests.exceptions.ConnectionError –
requests.exceptions.ConnectTimeout –
requests.exceptions.ContentDecodingError –
requests.exceptions.ReadTimeout –
requests.exceptions.SSLError –
- Returns:
The corresponding response.
- downloads(*args) dict [source]#
Performs multiple downloads in parallel. See downloads_async for further details. See also
request_cache()
to enable caching.- Parameters:
urls (list) – An iterable over strings, each of them corresponding to an URL.
timeout (tuple) – A tuple
(float, float)
corresponding to the (connect timeout, read timeout).return_exceptions (bool) – Pass
True
if this function is allowed to raise exceptions or must be quiet,False
otherwise. Defaults toTrue
.
- Returns:
A
dict({str : ?}}
mapping for each queried URL the corresponding contents (if successful), the correspondingException
otherwise.
- async downloads_async(urls: list, timeout: tuple = (1.0, 2.0), return_exceptions: bool = True)[source]#
Asynchronous download procedure.
- Parameters:
urls (list) – An iterable over strings, each of them corresponding to an URL.
timeout (tuple) – A tuple
(float, float)
corresponding to the (connect timeout, read timeout).return_exceptions (bool) – Pass
True
if this function is allowed to raise exceptions or must be quiet,False
otherwise. Defaults toTrue
.
:raises See the
download()
function for the list of possible exceptions.:- Returns:
A
dict({str : ?}}
mapping for each queried URL the corresponding contents (if successful), the correspondingException
otherwise.
- extract_response(response: object, extract_text: bool = True) object [source]#
Extracts from a response the corresponding contents or Exception.
- Parameters:
response (object) – A
requests.Response
or anException
instance.extract_text (bool) – A bool indicating if text must be extracted from HTML.
- Returns:
The corresponding Exception or str.
minifold.entries_connector module#
- class EntriesConnector(entries: list)[source]#
Bases:
Connector
EntriesConnector
wraps a list of minifold entries (list of dictionaries)Constructor.
- Parameters:
entries (list) – A list of minifold entries.
- attributes(obj: str = None) set [source]#
Lists available attributes related to a given collection of object stored in this
EntriesConnector
instance.- Parameters:
obj (str) – The name of the collection of entries. As
EntriesConnector
manages a single connection you may passNone
.- Returns:
The set of available attributes.
- property entries: list#
Accessor to the entries nested in this
EntriesConnector
instance.- Returns:
The nested entries.
minifold.filesystem module#
This file gathers useful function to interact with the filesystem of the local storage.
- check_writable_directory(directory: str)[source]#
Tests whether a directory is writable. If not, an expection is raised.
- Parameters:
directory (str) – A String containing an absolute path.
- Raises:
RuntimeError – If the directory does not exists or isn’t writable.
- ctime(path: str) datetime [source]#
Retrieves the creation date of a file.
- Parameters:
path (str) – A string containing the path of the file.
- Returns:
The corresponding datetime.
- find(dir_name: str) list [source]#
Lists the regular files in a stored in given directory or one of its subdirectories (in shell:
find -type f dir_name
).- Parameters:
dir_name (str) – A String corresponding to an existing directory.
- Returns:
A list of strings, each of them corresponding to a file.
- mkdir(directory: str)[source]#
Creates a directory (in shell:
mkdir -p
).- Parameters:
directory (str) – A String containing an absolute path.
- Raises:
OSError – If the directory cannot be created.
minifold.for_each module#
- class ForEachFilter(attribute_name: str, map_filter: dict)[source]#
Bases:
object
The
ForEachFilter
class is useful when an minifold entry (i.e., a dictionary) maps a key with a value which is a list of sub-entries.It filters each sub-entries that do not match the set of provided filters.
Constructor.
- Parameters:
attribute_name (str) – The key of the minifold entry attribute mapped with a list of sub-entries.
map_filter (dict) – A
dict(sub_key : filter)
where ̀`sub_key` corresponds to an (optional) sub-entry key andfilter
is acallable(object) -> bool
returningTrue
if and only if the value mapped to ̀`sub_key` indicates that the sub-entry must be filtered,False
otherwise.
- match_filters(sub_entry: dict) bool [source]#
Checks whether a sub-entry is filtered according to this
ForEachFilter
instance.- Parameters:
sub_entry (dict) – The sub-entry to be checked.
- Returns:
True
ifsub_entry
is not filtered,False
otherwise.
- for_each_sub_entry(entry, attribute: str, map_lambda: dict) dict [source]#
Applies a set of lambda functions to the sub-entries carried by a minifold entry (key, value) pair.
- Parameters:
attribute (str) – The key of
entry
carrying the sub-entries to be processed.map_lambda (dict) – A
dict(sub_key : transform)
where:sub_key
is an (optional) key involved in a sub-entry;transform
is a function transforming the value mapped tosub_key
to its new value.
- Returns:
The modified entry.
minifold.google_scholar module#
This file integrates the classes provided in scholar.py
to minifold.
It allows to query Google scholar.
- class GoogleScholarConnector(citation_format: str = 4)[source]#
Bases:
Connector
The
GoogleScholarConnector
class is a minifold gateway allowing to query Google scholar <https://scholar.google.com/>.See also: -
DblpConnector
. -GoogleScholarConnector
.Constructor.
- Parameters:
citation_format (str) – The citation format.
- attributes(object: str) set [source]#
Lists available attributes related to a given collection of object stored in this
GoogleScholarConnector
instance.- Parameters:
object (str) – The name of the collection of entries. The only supported object is:
"publication"
and"cluster"
.- Returns:
The set of available attributes for
object
.
- static filter_to_scholar(p: BinaryPredicate, gs_query: SearchScholarQuery, authors: list)[source]#
Converts a minifold predicate (applying to authors) to the corresponding scholar filter (recursive function).
- Parameters:
p (BinaryPredicate) – The minifold filter.
gs_query (ScholarQuery) – The Google Scholar query.
authors (list) – Pass an empty list.
- query(query: Query) list [source]#
Handles an input
Query
instance.- Parameters:
query (Query) – The handled query.
- Returns:
The list of entries matching the input Query.
- static sanitize_author(authors: list, author: str) str [source]#
Fixes author names, by finding in an input list of strings the closest string with the input string.
- Parameters:
authors (list) – The list of strings.
author (str) – The reference string.
- Returns:
The string of authors closest to author if any, else author
- class MinifoldScholarQuerier[source]#
Bases:
ScholarQuerier
:py:class`MinifoldScholarQuerier` overloads
ScholarQuerier
to fetch more attributes.Constructor.
- parse(s_html: str)[source]#
Populates
self.articles
using the HTML Google Scholar page.- Parameters:
s_html (str) – An HTML Google Scholar page content.
- Raises:
RuntimeError` if the result can't be fetched from Google scholar –
- send_query(gs_query: ScholarQuery)[source]#
Sends a query to Google scholar.
- Parameters:
gs_query (ScholarQuery) – A Google scholar query.
- Raises:
RuntimeError` if the result can't be fetched from Google scholar –
- parse_article(s_html: str) dict [source]#
Parse a “gs_res_ccl_mid” div (wrapping each article) returned by Google Scholar.
- Parameters:
s_html (str) – The HTML string containing the div.
- Returns:
“authors” : list, The list only contains the first authors. The last author name maybe incomplete.
”cluster_id” : int,
”conference” : str, # May be unset
”editor” : str,
”excerpt” : str, # May be unset
”num_citations” : int,
”num_versions” : int,
”title” : str,
”url_citations” : str,
”url_versions” : str,
”url_pdf” : str,
”url_title” : str,
”year” : int,
An incomplete string may start by
'…'
and ends with'…'
. URLs are absolute.- Return type:
The dict describing the article, structured as follows
minifold.group_by module#
- class GroupByConnector(attributes: list, child: Connector)[source]#
Bases:
Connector
The
GroupByConnector
class implements the GROUP BY statement in a minifold pipeline.Constructor.
- Parameters:
attributes (list) – The list of entry keys used to form the aggregates.
child (Connector) – The child minifold
Connector
instance.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
GroupByConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- group_by(attributes: list, entries: list) dict [source]#
Implements the GROUP BY statement for a list of minifold entries.
- Parameters:
attributes (list) – The list of entry keys used to form the aggregates.
entries (list) – A list of minifold entries.
- Returns:
A dictionary where each key identifies an aggregate and is mapped to the corresponding entries.
- group_by_impl(functor: ValuesFromDictFonctor, entries: list) dict [source]#
Implementation details of
group_by()
.- Parameters:
functor (ValuesFromDictFonctor) – The functor allowing to extract the values used to form the aggregates.
entries (list) – A list of minifold entries.
- Returns:
A dictionary where each key identifies an aggregate and is mapped to the corresponding entries.
minifold.hal module#
The HAL Computer science bibliography provides open bibliographic information on major computer science journals and proceedings.
Some query examples:
Retrieves LINCS publication since 2012 <https://api.archives-ouvertes.fr/search/?q=structId_i:(160294)&fq=producedDateY_i:[2012%20TO%20*]>
Retrieves articles authored by Fabien Mathieu since 2012<https://api.archives-ouvertes.fr/search/?q=*:*&fq=authFullName_s:(%22Fabien%20Mathieu%22)&fq=producedDateY_i:[2012%20TO%20*]&&fl=title_s&sort=submittedDate_tdate+desc&wt=json>
Retrieves LINCS publication in JSON for a subset of attributes, sorted from the newest one to the oldest one <https://api.archives-ouvertes.fr/search/?q=structId_i:(160294)&fq=&rows=999&fl=keyword_s,producedDateY_i,authFullName_s,*itle_s,abstract_s,docType_s&sort=submittedDate_tdate+desc&wt=json>
It is possible to query a specific researcher using its HAL-ID.
- class HalConnector(map_hal_id: dict = None, map_hal_name: dict = None, hal_api_url: str = 'https://api.archives-ouvertes.fr/search')[source]#
Bases:
Connector
The
DblpConnector
class is a minifold gateway allowing to fetch data from DBLP (repository of scientific articles).See also: -
DblpConnector
. -GoogleScholarConnector
.Constructor.
- Parameters:
map_hal_id (dict) – A dictionary that maps some researcher names to their corresponding HAL-ID.
map_hal_name (dict) – A dictionary that maps some researcher names to their name in HAL.
- property api_url: str#
Retrieves the HAL repository URL of this
HalConnector
instance.- Returns:
The HAL repository URL.
- attributes(object: str) set [source]#
Lists available attributes related to a given collection of minifold entries exposed by this
Connector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- static binary_predicate_to_hal(p: BinaryPredicate) str [source]#
Converts a minifold predicate to the corresponding HAL URL predicate.
- Parameters:
p (BinaryPredicate) – The minifold predicate.
- Returns:
The corresponding HAL URL predicate.
- property format: str#
Retrieves the format of the HAL results of this
HalConnector
instance.- Returns:
The format of the HAL results (e.g.,
"json"
).
- property map_hal_id: dict#
Retrieves the dictionary mapping researcher names to the corresponding HAL ID of this
HalConnector
instance.- Returns:
The dictionary mapping researcher names to the corresponding HAL ID of this
HalConnector
instance.
- property map_hal_name: dict#
Retrieves the dictionary mapping researcher names to the corresponding HAL name of this
HalConnector
instance.- Returns:
The dictionary mapping researcher names to the corresponding HAL name of this
HalConnector
instance.
- query(q: Query) list [source]#
Handles an input
Query
instance.- Parameters:
query (Query) – The handled query.
- Returns:
The list of entries matching the input query.
- query_to_hal(q: Query) str [source]#
Converts a minifold query to a HAL URL query.
- Parameters:
q (Query) – The minifold query.
- Returns:
The corresponding HAL URL.
- static quote(s: str) str [source]#
Quotes a string to encode it in an HAL URL.
- Parameters:
s (str) – The string to be quoted.
- Returns:
The quoted string.
- static sanitize_dict(d: dict) dict [source]#
Reshapes a dictionary obtained from a JSON HAL result.
- Parameters:
d (dict) – The JSON HAL dictionary.
- Returns:
The reshape dictionary.
- sanitize_entries(entries: list) list [source]#
Reshapes a collection of raw minifold entries related to HAL.
- Parameters:
entries (dict) – The input minifold entries.
- Returns:
The reshaped minifold entries.
- sanitize_entry(entry: dict) dict [source]#
Reshapes a raw minifold entry related to HAL.
- Parameters:
entry (dict) – The input minifold entry.
- Returns:
The reshaped minifold entry.
minifold.hash module#
Internals to make hashable object that are not hashable.
This is useful, e.g, in the UniqueConnector
class.
minifold.html module#
This file gathers some utilities to process, render, or make HTML strings in minifold.
- entries_to_html(entries: list, map_attribute_label: dict = None, attributes: list = None, keep_entry_if: callable = None) str [source]#
Exports to HTML a list of dict.
- Parameters:
entries – A list of dicts
map_attribute_label – A
dict{str: str}
which maps each entry key with the column header to display.attributes – The subset of keys to display.
keep_entry_if – Callback allowing to filter some entries
- Returns:
The corresponding HTML string.
- html(s: str)[source]#
Evaluates HTML code in a Jupyter Notebook.
- Parameters:
s – A str containing HTML code.
- html_to_text(s_html: str, blacklist: set = None) str [source]#
Converts an HTML page to text, by discarding javascript and css related to the site.
- Parameters:
s_html (str) – A str containing HTML.
blacklist (set) – A set of string (lowercase) corresponding to HTML tags that must be ignored.
- Returns:
The corresponding text.
- print_error(x: object)[source]#
Prints an error in a Jupyter Notebook.
- Parameters:
x – An expection.
- remove_all_attrs_except_saving(soup: BeautifulSoup, whitelist: dict = None)[source]#
Removes all attributes except some.
- Parameters:
soup (BeautifulSoup) – A BeautifulSoup instance, modified in place.
whitelist – A
dict{tag: list(attr)}
where tag is an HTML tag and attr an HTML attribute.
- remove_tags(soup: BeautifulSoup, blacklist: set = None)[source]#
Removes some HTML tags.
- Parameters:
soup (BeautifulSoup) – A BeautifulSoup instance, modified in place.
blacklist (set) – A list of str, where each str is an HTML tag.
- sanitize_html(s_html: str, blacklist: set = None, remove_attrs: bool = True) str [source]#
Removes from an HTML string irrelevant HTML blocks and attributes. Warning: This function is SLOW so do not use it on large corpus!
- Parameters:
s_html (str) – A str instance containing HTML.
blacklist (set) – List of blacklisted HTML tags.
remove_attrs (bool) – Pass
True
to remove HTML tag attributes.
- Returns:
The sanitized string.
minifold.html_table module#
- class HtmlTableConnector(filename: str, columns: list, keep_entry: callable = None)[source]#
Bases:
EntriesConnector
The
HtmlTableConnector
class is a minifold gateway allowing to fetch data stored in an HTML table.Constructor.
- Parameters:
filename (str) – Input HTML filename.
columns (list) – list of string mapping the attribute name corresponding with the index. If data is fetch for columns having a greater index than len(columns), columns[-1] is used, and this key may store a list of string values instead of a single string. This allow to store data stored among several columns in a single attribute.
keep_entry (callable) – callback which determine whether an must entry must be kept or discard. Pass None to filter nothing. This is the opportunity to discard a header or irrelevant row.
- attributes(object: str = None) set [source]#
Lists available attributes related to a given collection of minifold entries exposed by this
Connector
instance.- Parameters:
object (str) – The name of the collection. As this connector stores a single collection, you may pass
None
. Defaults toNone
.- Returns:
The set of corresponding attributes.
- class HtmlTableParser(columns: list, output_list: list, keep_entry: callable = None)[source]#
Bases:
HTMLParser
The
HtmlTableParser
class extracts the values from an HTML table.Constructor.
- Parameters:
columns (list) – A list of string mapping the attribute name corresponding with the index. If data is fetch for columns having a greater index than len(columns), columns[-1] is used, and this key may store a list of string values instead of a single string. This allow to store data stored among several columns in a single attribute.
output_list (list) – A reference to an output list where the data will be output (one dict per row, one key/value per column).
keep_entry (callable) – A callback which determines whether an must entry must be kept or discarded. Pass
None
to filter nothing. This is the opportunity to discard a header or irrelevant rows.
- handle_data(data: str)[source]#
Callback that handles an opening HTML data.
- Parameters:
data (str) – The HTML data (here, stored in a table cell).
- html_table(filename: str, columns: list, keep_entry: bool = None) list [source]#
Loads an HTML table from an input file
- Parameters:
filename (str) – The path to the input HTML file.
columns (list) – A list of string mapping the attribute name corresponding with the index. If data is fetch for columns having a greater index than len(columns), columns[-1] is used, and this key may store a list of string values instead of a single string. This allow to store data stored among several columns in a single attribute.
keep_entry (callable) – A callback which determines whether an must entry must be kept or discarded. Pass
None
to filter nothing. This is the opportunity to discard a header or irrelevant rows.
- Returns:
The corresponding list of minifold entries.
minifold.ipynb module#
This file gathers utilities related to ipython Jupyter notebooks.
minifold.join_if module#
- class JoinIfConnector(left: Connector, right: Connector, join_if: callable, mode: int = 1)[source]#
Bases:
Connector
The
JoinIfConnector
is a minifold connector that implements the INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL OUTER JOIN statements in a minifold pipeline.Constructor.
- Parameters:
- attributes(object: str)[source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
JoinIfConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- property join_if: callable#
Retrieves the functor to join entries in this
JoinIfConnector
instance.- Returns:
The left
Connector
child.
- property left: Connector#
Retrieves the left
Connector
child in thisJoinIfConnector
instance.- Returns:
The left
Connector
child.
- property mode: int#
Retrieves the join mode used in this
JoinIfConnector
instance.- Returns:
INNER_JOIN
LEFT_JOIN
RIGHT_JOIN
FULL_OUTER_JOIN
- Return type:
A value among
- query(query: Query) list [source]#
Handles an input
Query
instance.- Parameters:
query (Query) – The handled query.
- Raises:
ValueError –
- Returns:
The list of entries matching the input query.
- property right: Connector#
Retrieves the right
Connector
child in thisJoinIfConnector
instance.- Returns:
The left
Connector
child.
- are_joined_if(l_entry: dict, r_entry: dict, f: callable) bool [source]#
Internal function, used to check whether two dictionaries can be joined according to a functor.
- Parameters:
l_entry (dict) – The dictionary corresponding to the left operand.
r_entry (dict) – The dictionary corresponding to the right operand.
f (callable) – A functor such that
f(l, r)
returnsTrue
if and only ifl
andr
can be joined,False
otherwise.
- Raises:
RuntimeError – on keys that are missing in
l
orr
.- Returns:
True
if and only ifl
andr
can be joined,False
otherwise.
- full_outer_join_if(l_entries: list, r_entries: list, f: callable, match_once: bool = True) list [source]#
Computes the FULL OUTER JOIN of two lists of minifold entries.
- Parameters:
l_entries (dict) – The minifold entries corresponding to the left operand.
r_entries (dict) – The minifold entries corresponding to the right operand.
f (callable) – A functor such that
f(l, r)
returnsTrue
if and only ifl
andr
can be joined (wherel
andr
are two minifold entries),False
otherwise.match_once (bool) – Pass
True
if a left entry must be matched at most once.
- Returns:
The corresponding list of entries.
- inner_join_if(l_entries: list, r_entries: list, f: callable, match_once: bool = True, merge: callable = <function merge_dict>) list [source]#
Computes the INNER JOIN of two lists of minifold entries.
- Parameters:
l_entries (dict) – The minifold entries corresponding to the left operand.
r_entries (dict) – The minifold entries corresponding to the right operand.
f (callable) – A functor such that
f(l, r)
returnsTrue
if and only ifl
andr
can be joined (wherel
andr
are two minifold entries),False
otherwise.match_once (bool) – Pass
True
if a left entry must be matched at most once.merge (callable) – A function that merges two input dictionaries. Defaults to
merge_dict()
.
- Returns:
The corresponding list of entries.
- join_mode_to_string(mode: int) str [source]#
Convert a
*_JOIN
constant to the corresponding string representation.Args:
- left_join_if(l_entries: list, r_entries: list, f: callable, match_once: bool = True, merge: callable = <function merge_dict>) list [source]#
Computes the LEFT JOIN of two lists of minifold entries.
- Parameters:
l_entries (dict) – The minifold entries corresponding to the left operand.
r_entries (dict) – The minifold entries corresponding to the right operand.
f (callable) – A functor such that
f(l, r)
returnsTrue
if and only ifl
andr
can be joined (wherel
andr
are two minifold entries),False
otherwise.match_once (bool) – Pass
True
if a left entry must be matched at most once.merge (callable) – A function that merges two input dictionaries. Defaults to
merge_dict()
.
- Returns:
The corresponding list of entries.
- merge_dict(l_entry: dict, r_entry: dict) dict [source]#
Merges two dictionaries. The input dictionaries are not altered.
Example
>>> merge_dict({"a": 1, "b": 2}, {"b": 3, "c": 4}) {'a': 1, 'b': 3, 'c': 4}
- Parameters:
l_entry (dict) – A dictionary.
r_entry (dict) – Another dictionary.
- Returns:
The dictionary obtained by merging
l
andr
.
- right_join_if(l_entries: list, r_entries: list, f: callable, match_once: bool = True) list [source]#
Computes the RIGHT JOIN of two lists of minifold entries.
- Parameters:
l_entries (dict) – The minifold entries corresponding to the left operand.
r_entries (dict) – The minifold entries corresponding to the right operand.
f (callable) – A functor such that
f(l, r)
returnsTrue
if and only ifl
andr
can be joined (wherel
andr
are two minifold entries),False
otherwise.match_once (bool) – Pass
True
if a left entry must be matched at most once.
- Returns:
The corresponding list of entries.
minifold.json module#
- class JsonConnector(json_data: str, extract_json: callable = <function identity>)[source]#
Bases:
EntriesConnector
The
JsonConnector
class is a gateway to a JSON string.Constructor.
- Parameters:
json_data – A JSON string.
extract_json – A function that converts the JSON data to a list of minifold entries. Defaults to
identity()
.
- class JsonFileConnector(json_filename: str, extract_json: callable = <function identity>)[source]#
Bases:
JsonConnector
The
JsonConnector
class is a gateway to a JSON file.Constructor.
- Parameters:
json_filename – The path of the input JSON file.
extract_json – A function that converts the JSON data to a list of minifold entries. Defaults to
identity()
.
- identity(x: object) list [source]#
Identity function.
- Parameters:
x (object) – An arbitrary object.
- Returns:
The
x
object.
- load_json_from_file(json_filename: str, extract_json: callable = <function identity>) list [source]#
Loads minifold from a JSON file.
- Parameters:
json_filename (str) – The path to a JSON file.
extract_json (callable) – A function that reprocesses the python structure resulting from the JSON to produce a list of minifold entries. Defaults to
identity()
.
- Returns:
A list of minifold entries.
- load_json_from_str(json_data: str, extract_json: callable = <function identity>) list [source]#
Loads minifold from a JSON string.
- Parameters:
json_data (str) – The JSON data.
extract_json (callable) – A function that reprocesses the python structure resulting from the JSON to produce a list of minifold entries. Defaults to
identity()
.
- Returns:
A list of minifold entries.
minifold.lambdas module#
- class LambdasConnector(map_lambdas: dict, child: Connector, map_dependencies: dict = None)[source]#
Bases:
Connector
The
LambdasConnector
class is used to apply the lambdas function in the middle of a minifold pipeline. It allows to craft or reshape a flow of minifold entries on-the-fly.Constructor.
- Parameters:
map_lambdas (dict) – A dictionary that maps key (existing or new) key attributes with a function processing an input entry.
child (Connector) – The child minifold
Connector
instance.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
LambdasConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- property child: Connector#
Accessor to the child minifold
Connector
instance.- Returns:
The child minifold
Connector
instance.
- find_lambda_dependencies(func: callable) set [source]#
Infers the keys needed by a function processing a dictionary to not trigger
KeyError
exception.- Parameters:
func (callable) – A function taking a dictionary in parameter.
- Returns:
The keys needed by
func
to process a dictionary.
- find_lambdas_dependencies(map_lambdas: dict) dict [source]#
Infers the keys needed by several functions that outputs a specific entry key when processing an input entry so that no
KeyError
exception is raised.- Parameters:
map_lambdas (dict) – A dictionary that maps key (existing or new) key attributes with a function processing an input entry.
- Returns:
The keys needed by the functions involved in
map_lambdas.values()
to process a minifold entry
- lambdas(map_lambdas: dict, entries: list, attributes: set = None) list [source]#
Be sure that the result is deterministic without regards each lambda is processed.
Examples
>>> map_lambdas = {"x": lambda e: 10 + e["x"]} # OK >>> map_lambdas = {"x": lambda e: 10 + e["x"] + e["y"]} # OK >>> map_lambdas = { ... "x": lambda e: 10 + e["x"] + e["y"], ... "y": lambda e: 10 + e["y"] ... } # not OK because e["y"] is ambiguous.
minifold.ldap module#
- class LdapConnector(ldap_host: str, ldap_user: str = None, ldap_password: str = None, ldap_use_ssl: bool = None)[source]#
Bases:
Connector
Constructor.
- Parameters:
ldap_host (str) – The FQDN or the IP of the server (e.g., “my-ldap.firm.com”).
ldap_user (str) – The LDAP login used to connect to the server.
ldap_password (str) – The LDAP password of
ldap_user
.ldap_use_ssl (bool) – Pass
True
if the connection to the server must be established using SSL,False
orNone
otherwise.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
LdapConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- static binary_predicate_to_ldap(p: BinaryPredicate) str [source]#
Converts a
BinaryPredicate
to the corresponding LDAP predicate.- Parameters:
p (BinaryPredicate) – A
BinaryPredicate
instance.- Returns:
The corresponding LDAP predicate.
- static literal_from_ldap(b: bytes) str [source]#
Converts a LDAP literal to the corresponding string.
- Parameters:
b (bytes) – The input literal.
- Returns:
The corresponding string.
- static operand_to_ldap(operand: str) str [source]#
Converts a minifold operand to the corresponding LDAP operand.
- Parameters:
operand (str) – The operand to be converted.
- Returns:
The corresponding string.
- static operator_to_ldap(op) str [source]#
Converts a minifold operator to the corresponding LDAP operator.
- Parameters:
operator – The operator to be converted.
- Returns:
The corresponding LDAP string.
minifold.lexical_cast module#
- cast_bool(s: str) bool [source]#
Casts a string to a
bool
if possible, raises an exception otherwise.- Parameters:
s (str) – The string to be casted.
- Raises:
ValueError`, if the cast cannot be achieved –
- Returns:
The boolean corresponding to s if successful.
- cast_none(s: str) None [source]#
Casts a string to
None
if possible, raises an exception otherwise.- Parameters:
s (str) – The string to be casted.
- Raises:
ValueError – if the cast cannot be achieved.
- Returns:
None
if successful.
- lexical_cast(s: str, cast: callable) object [source]#
Casts a string according to an operator. See also the
lexical_casts()
function.- Parameters:
s (str) – The string to be casted.
cast (callable) – A single cast operator. _Examples:_
cast_bool()
,cast_none()
,int()
,float()
, etc.
- Raises:
ValueError`, if the cast cannot be achieved –
- Returns:
The corresponding value.
- lexical_casts(s: str, cast_operators: list = None) object [source]#
Casts a string according to several cast operators. See also the
lexical_cast()
function.- Parameters:
s (str) – The string to be casted.
cast_operators – A list of cast operators. Operators must be ordered by decreasing strictness (e.g.
int
should preceedfloat
). PassNone
to use the default list of cast operators.
- Returns:
The original string if no cast worked, the corresponding casted value otherwise.
minifold.limit module#
- class LimitConnector(child, lim: int)[source]#
Bases:
Connector
The
LimitConnector
class implements the LIMIT statement in a minifold pipeline.Constructor.
- Parameters:
child (Connector) – The child minifold
Connector
instance.lim (int) – A positive integer, limiting the number of entries to return. Pass
None
if there is no limit.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
LimitConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- property child#
Accessor to the child minifold
Connector
instance.- Returns:
The child minifold
Connector
instance.
- property limit: int#
Accessor to the limit size.
- Returns:
The limit size or
None
(no limit).
- limit(entries: list, lim: int) list [source]#
Implements the LIMITstatement for a list of minifold entries.
Example
>>> where([{"a": 1, "b": 1}, {"a": 2, "b": 2}, {"a": 3, "b": 3}], 2) [{'a': 1, 'b': 1}, {'a': 2, 'b': 2}]
- Parameters:
entries (list) – A list of minifold entries.
lim (int) – A positive integer, limiting the number of entries to return. Pass
None
if there is no limit.
- Returns:
The kept entries.
minifold.log module#
- class Log[source]#
Bases:
object
The
Log
enables logging in Minifold.Example
>>> Log.enable_print = True >>> Log.info("hello")
- enable_print = False#
- log_level = 0#
- message_color = {0: 6, 1: 2, 2: 3, 3: 1}#
- message_header = {0: 'DEBUG', 1: 'INFO', 2: 'WARNING', 3: 'ERROR'}#
- classmethod print(message_type: int, message: str, file=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>)[source]#
Internal method, used to orints a message with a custom header and style.
- Parameters:
message_type (int) – A value in
INFO
,DEBUG
,WARNING
,ERROR
.message (str) – The message to log.
file – The output stream. Defaults to
sys.stderr
.
- static start_style(fg_color: int = None, bg_color: int = None, styles: list = None) str [source]#
Crafts a shell escape sequence to start a style.
- Parameters:
fg_color (int) – The integer identifying the foreground color to be used. _Example:_
GREEN
.bg_color (int) – The integer identifying the background color to be used. _Example:_
PINK
.styles (list) – _Example:_
[UNDERLINED, BLINKING]
.
- Returns:
The corresponding shell string.
- with_color = True#
minifold.mongo module#
- class MongoConnector(mongo_url: str, db_name: str)[source]#
Bases:
Connector
The
MongoConnector
is a minifold gateway allowing to manipulate data stored in a Mongo database.Constructor.
- Parameters:
mongo_url (str) – The URL of the mongo database.
db_name (str) – The name of the queried database.
- attributes(obj: str = None) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
MongoConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- connect(mongo_url: str) MongoClient [source]#
Connects to a Mongo database.
- Parameters:
mongo_url (str) – The URL of the mongo database.
- Returns:
py:class`MongoClient` instance.
- Return type:
The corresponding
minifold.natural_join module#
- class NaturalJoinConnector(left: Connector, right: Connector)[source]#
Bases:
Connector
The
NaturalJoinConnector
is a minifold connector that implements the NATURAL JOIN statement in a minifold pipeline.Constructor.
- Parameters:
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
JoinIfConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- property left: Connector#
Retrieves the left
Connector
child in thisJoinIfConnector
instance.- Returns:
The left
Connector
child.
- are_naturally_joined(l_entry: dict, r_entry: dict) bool [source]#
Internal function, used to check whether two dictionaries can be joined using an NATURAL JOIN statement.
- Parameters:
l_entry (dict) – The dictionary corresponding to the left operand.
r_entry (dict) – The dictionary corresponding to the right operand.
- Returns:
True
if and only ifl
andr
can be joined,False
otherwise (i.e.,l
andr
have no common key).
- natural_join(l_entries: list, r_entries: list) list [source]#
Computes the NATURAL JOIN of two lists of minifold entries.
- Parameters:
l_entries (dict) – The minifold entries corresponding to the left operand.
r_entries (dict) – The minifold entries corresponding to the right operand.
f (callable) – A functor such that
f(l, r)
returnsTrue
if and only ifl
andr
can be joined (wherel
andr
are two minifold entries),False
otherwise.match_once (bool) – Pass
True
if a left entry must be matched at most once.merge (callable) – A function that merges two input dictionaries. Defaults to
merge_dict()
.
- Returns:
The corresponding list of entries.
minifold.proxy module#
- class Proxy(*args, **kwargs)[source]#
Bases:
dict
The
Proxy
class is a singleton dictionary storing proxy settings. It typically maps each protocol (e.g., “http”, “https”) with the corresponding proxy URL (e.g., “http://localhost:8080”).>>> proxy = Proxy() >>> proxy_enable("localhost", 8080) >>> proxy_disable()
- make_session() Session [source]#
Creates a
requests.Session
instance according to theProxy
singleton.- Returns:
The corresponding
requests.Session
instance.
- proxy_enable(host: str, port: int, protocols: list = None)[source]#
Enables the
Proxy
singleton.- Parameters:
host (str) – The proxy FQDN or IP address.
port (int) – The proxy port.
protocols (list) – The list of protocols supported by the proxy. Passing
None
is equivalent to passing["http", "https"]
. Defaults toNone
.
minifold.query module#
- class Query(action: int = 1, object: str = '', attributes: list = None, filters: object = None, offset: int = None, limit: int = None, sort_by: dict = None)[source]#
Bases:
object
Constructor.
- Parameters:
action (int) – A value in
ACTION_CREATE
(in SQL, INSERT queries),ACTION_READ
(in SQL, SELECT queries),ACTION_UPDATE
(in SQL, UPDATE queries),ACTION_DELETE
orNone
(in SQL, DELETE queries). PassingNone
is equivalent toACTION_CREATE
. Defaults toNone
.object (str) – The queried object collection (some minifold gateways may host several entries collections, identified by a name). Defaults to
""
.attributes (list) – A list of attributes. In SQL, this corresponds to the attributes pass to SELECT.
filters (object) – A minifold filter. In SQL, this corresponds to the WHERE clause. See also the
BinaryPredicate`a and the :py:class:`SearchFilter
classes.offset (int) – A positive integer or
None
if not needed. In SQL this corresponds to the OFFSET statement.limit (int) – A positive integer or
None
if not needed. In SQL this corresponds to the LIMIT statement.sort_by (dict) – A dictionary characterizing how to sort the results. It maps each attributes to be sorted with the corresponding sorting order (
SORT_ASC
orSORT_DESC
). In SQL this corresponds to the SORT BY statement.
- property action: int#
- property attributes: list#
- property filters#
- property limit: int#
- property object: str#
- property offset: int#
- property sort_by: list#
minifold.rename module#
- class RenameConnector(mapping: dict = None, child: Connector = None)[source]#
Bases:
Connector
The
RenameConnector
class wraps therename()
function to exploit it in a minifold query plan.Constructor.
- Parameters:
mapping (dict) – A dictionary mapping each key to be replaced by the new corresponding key (overlying to underlying ontology). Note that mapping must be reversible, i.e.
reverse_dict(reverse_dict(d)) == d
.child (Connector) – The child minifold
Connector
instance.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
RenameConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- property child#
Accessor to the child minifold
Connector
instance.- Returns:
The child minifold
Connector
instance.
- property map_qr#
Accessor to the renaming mapping (overlying to underlying).
- Returns:
The corresponding mapping.
- property map_rq#
Accessor to the renaming mapping (underlying to overlying).
- Returns:
The corresponding mapping.
- rename(mapping: dict, entries: list) list [source]#
Replaces several keys (possibly) involved in a list of minifold entries.
Example
>>> rename({"a": "A", "b": "B"}, [{"a": 1, "b": 2, "c": 3}, {"a": 10, "b": 20, "c": 30}]) [{'c': 3, 'A': 1, 'B': 2}, {'c': 30, 'A': 10, 'B': 20}]
- Parameters:
mapping (dict) – A dictionary mapping each key to be replaced by the new corresponding key.
entries (list) – A list of minifold entries, updated in place.
- Returns:
The updated entries.
- rename_entry(d: dict, mapping: dict) dict [source]#
Replaces several keys (possibly) involved in a dictionary.
Example
>>> rename_entry({"a": 1, "b": 2, "c": 3}, {"a": "A", "b": "B"}) {'c': 3, 'A': 1, 'B': 2}
- Parameters:
d (dict) – The input dictionary, updated in place.
mapping (dict) – A dictionary mapping each key to be replaced by the new corresponding key.
- Returns:
The updated dictionary.
- rename_filters(filters: object, mapping: dict)[source]#
Rename the keys involved in a list of minifold filters (see
BinaryPredicate
), typically the WHERE part of aQuery
instance. Recursive function. See therename_query()
function.- Parameters:
filters (object) – The minifold filters, updated in place.
mapping (dict) – A dictionary mapping each key to be replaced by the new corresponding key.
- Returns:
The updated filters.
- rename_key(d: dict, old_key: str, new_key: str) dict [source]#
Replaces a key (possibly) involved in a dictionary.
Example
>>> rename_key({"a": 1, "b": 2, "c": 3}, "a", "A") {'b': 2, 'c': 3, 'A': 1}
- Parameters:
d (dict) – The input dictionary, updated in place.
old_key (str) – The key to be updated.
new_key (str) – The new key.
- Returns:
The updated dictionary.
- rename_list(values: list, mapping: dict) list [source]#
Replaces some values involved in a list.
Example
>>> rename_list(["a", "b", "a", "c"], {"a": "A", "b": "B"}) ['A', 'B', 'A', 'c']
- Parameters:
values (list) – The input list, modified in place.
mapping (dict) – A dictionary mapping each value to be replaced by the new corresponding value.
- Returns:
The updated list.
- rename_query(q: Query, mapping: dict) Query [source]#
Rename some attributes involved in a
Query
instance. This is especially useful when a minifold pipeline involves several data sources using different naming conventions for a same attribute.- Parameters:
q (Query) –
mapping (dict) – A dictionary mapping each key to be replaced by the new corresponding key.
- Returns:
The renamed minifold query.
- rename_sort_by(sort_by: dict, mapping: dict) dict [source]#
Rename the SORT BY part of a
Query
instance. See therename_query()
function.- Parameters:
sort_by (dict) – The SORT BY part of a
Query
instance.mapping (dict) – A dictionary mapping each key to be replaced by the new corresponding key.
- Returns:
The dictionary corresponding to
sort_by
after renaming.
minifold.request_cache module#
- install_cache(cache_filename: str = None)[source]#
Enables
requests_cache
for minifold, hence allowing to cache HTTP queries issued by minifold.- Parameters:
cache_filename (str) – The path to the minifold cache. You may pass
None
to use the default path (i.e.,~/.minifold/cache/requests_cache
under Linux).
minifold.scholar module#
This module provides classes for querying Google Scholar and parsing returned results. It currently only processes the first results page. It is not a recursive crawler.
- class ClusterScholarQuery(cluster=None)[source]#
Bases:
ScholarQuery
This version just pulls up an article cluster whose ID we already know about.
- SCHOLAR_CLUSTER_URL = 'http://scholar.google.com/scholar?cluster=%(cluster)s%(num)s'#
- exception QueryArgumentError[source]#
Bases:
Error
A query did not have a suitable set of arguments.
- class ScholarArticle[source]#
Bases:
object
A class representing articles listed on Google Scholar. The class provides basic dictionary-like behavior.
- class ScholarArticleParser(site=None)[source]#
Bases:
object
ScholarArticleParser can parse HTML document strings obtained from Google Scholar. This is a base class; concrete implementations adapting to tweaks made by Google over time follow below.
- handle_article(art)[source]#
The parser invokes this callback on each article parsed successfully. In this base class, the callback does nothing.
- class ScholarArticleParser120726(site=None)[source]#
Bases:
ScholarArticleParser
This class reflects update to the Scholar results page layout that Google made 07/26/12.
- class ScholarConf[source]#
Bases:
object
Helper class for global settings.
- COOKIE_JAR_FILE = None#
- LOG_LEVEL = 1#
- MAX_PAGE_RESULTS = 10#
- SCHOLAR_SITE = 'http://scholar.google.com'#
- USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0'#
- VERSION = '2.10'#
- class ScholarQuerier[source]#
Bases:
object
ScholarQuerier instances can conduct a search on Google Scholar with subsequent parsing of the resulting HTML content. The articles found are collected in the articles member, a list of ScholarArticle instances.
- GET_SETTINGS_URL = 'http://scholar.google.com/scholar_settings?sciifh=1&hl=en&as_sdt=0,5'#
- class Parser(querier)[source]#
Bases:
ScholarArticleParser120726
- SET_SETTINGS_URL = 'http://scholar.google.com/scholar_setprefs?q=&scisig=%(scisig)s&inststart=0&as_sdt=1,5&as_sdtp=&num=%(num)s&scis=%(scis)s%(scisf)s&hl=en&lang=all&instq=&inst=569367360547434339&save='#
- get_citation_data(article)[source]#
Given an article, retrieves citation link. Note, this requires that you adjusted the settings to tell Google Scholar to actually provide this information, prior to retrieving the article.
- class ScholarQuery[source]#
Bases:
object
The base class for any kind of results query we send to Scholar.
- class ScholarSettings[source]#
Bases:
object
This class lets you adjust the Scholar settings for your session. It’s intended to mirror the features tunable in the Scholar Settings pane, but right now it’s a bit basic.
- CITFORM_BIBTEX = 4#
- CITFORM_ENDNOTE = 3#
- CITFORM_NONE = 0#
- CITFORM_REFMAN = 2#
- CITFORM_REFWORKS = 1#
- class ScholarUtils[source]#
Bases:
object
A wrapper for various utensils that come in handy.
- LOG_LEVELS = {'debug': 4, 'error': 1, 'info': 3, 'warn': 2}#
- class SearchScholarQuery[source]#
Bases:
ScholarQuery
This version represents the search query parameters the user can configure on the Scholar website, in the advanced search options.
- SCHOLAR_QUERY_URL = 'http://scholar.google.com/scholar?as_q=%(words)s&as_epq=%(phrase)s&as_oq=%(words_some)s&as_eq=%(words_none)s&as_occt=%(scope)s&as_sauthors=%(authors)s&as_publication=%(pub)s&as_ylo=%(ylo)s&as_yhi=%(yhi)s&as_vis=%(citations)s&btnG=&hl=en%(num)s&as_sdt=%(patents)s%%2C5'#
- get_url()[source]#
Returns a complete, submittable URL string for this particular query instance. The URL and its arguments will vary depending on the query.
- set_scope(title_only)[source]#
Sets Boolean indicating whether to search entire article or title only.
minifold.search module#
- class SearchFilter(search_values: list, attributes: list, match: callable)[source]#
Bases:
object
The
SearchFilter
implements a minifold filter allowing to filter minifold entries based on a search predicate.Constructor.
- Parameters:
search_values (list) – The searched values.
attributes (list) – The attributes of interest.
match (callable) – The search strategy, e.g.,
equals()
,contains()
. If the values are string, you could also consider:lower_case_contains()
,lower_case_equals()
,contains_words()
.
- contains(x: object, y: object) bool [source]#
Tests whether two objects are equal (wraps the
==
operator).Example
>>> equals("bar", "foobar") True
- Parameters:
x (object) – An object.
y (object) – An object.
- Returns:
True
if and only ifx in y
,False
otherwise.
- contains_words(word: str, sentence: str, ignore_case: bool = True)[source]#
Checks whether a word is contained in a string (e.g., a sentence).
Examples
>>> contains_words("earth", "Earth is a planet.") True >>> contains_words("earth", "A terrible earthquake.") False
- Parameters:
word (str) – The searched word.
sentence (str) – The queried string.
ignore_case (bool) – Pass True if the search is not case sensitive,
False
otherwise. Defaults toTrue
.
- Returns:
True
ifword
has been found insentence
,False
otherwise.
- equals(x: object, y: object) bool [source]#
Tests whether two objects are equal (wraps the
==
operator).Examples
>>> equals("foo", "foo") True >>> equals("foo", "FOO") False
- Parameters:
x (object) – An object.
y (object) – An object.
- Returns:
True
if and only ifx == y
,False
otherwise.
- lower_case_contains(x: str, y: str) bool [source]#
Checks whether a string is included in another one without considering the case.
Example
>>> lower_case_contains("foo", "barFOObar") True
- Parameters:
x (str) – The search string.
y (str) – The queried string.
- Returns:
True
if and only if the both strings are equal without considering the case,False
otherwise.
- lower_case_equals(x: str, y: str) bool [source]#
Checks whether two string are equals without considering the case.
Example
>>> lower_case_equals("foo", "FOO") True
- Parameters:
x (str) – A string.
y (str) – A string.
- Returns:
True
if and only if the both strings are equal without considering the case,False
otherwise.
- search(entries: list, attributes: list, search_values: list, match: callable = <function equals>) list [source]#
Searches the entries whose at least one attribute of interest is matched a searched word
- Parameters:
entries (list) – A list of minifold entries.
attributes (list) – The attributes of interest.
search_values (list) – The searched values.
match (callable) – The search strategy, e.g.,
equals()
,contains()
. If the values are string, you could also consider:lower_case_contains()
,lower_case_equals()
,contains_words()
.
- Returns:
The subset of entries matched by the search.
minifold.select module#
- class SelectConnector(child: Connector, attributes: list)[source]#
Bases:
Connector
The
SelectConnector
class implements the SELECT statement in a minifold pipeline.Constructor.
- Parameters:
child (Connector) – The child minifold
Connector
instance.attributes (list) – The selected keys.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
SelectConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- property child#
Accessor to the child minifold
Connector
instance.- Returns:
The child minifold
Connector
instance.
- select(entries: list, attributes: list) list [source]#
Implements the SELECT statement for a list of minifold entries.
Example
>>> select([{"a": 1, "b": 2, "c": 3}, {"a": 10, "b": 20, "c": 30}], ["a", "b"]) [{'a': 1, 'b': 2}, {'a': 10, 'b': 20}]
- Parameters:
entries (list) – A list of minifold entries.
attributes (list) – The selected keys.
- Returns:
The input entries, restricted to the key of interest.
minifold.singleton module#
minifold.sort_by module#
- class SortByConnector(attributes: list, child: Connector, desc: bool = False)[source]#
Bases:
Connector
The
SortByConnector
class implements the SORT BY statement in a minifold pipeline.Constructor.
- Parameters:
attributes (list) – The list of entry keys used to sort.
child (Connector) – The child minifold
Connector
instance.
- desc (bool): Pass
True
to sort by ascending order, False
otherwise.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
SortByConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- property child#
Accessor to the child minifold
Connector
instance.- Returns:
The child minifold
Connector
instance.
- property desc: bool#
Checks whether this
SortByConnector
sorts the entries by descending order.- Returns:
True
if thisSortByConnector
sorts the entries by descending order,False
otherwise.
- sort_by(attributes: list, entries: list, desc: bool = False) list [source]#
Sorts a list of minifold entries.
- Parameters:
attributes (list) – The list of entry keys used to sort.
entries (list) – A list of minifold entries.
desc (bool) – Pass
True
to sort by ascending order,False
otherwise.
- Returns:
The sorted entries, with respect to
functor
.
- sort_by_impl(functor: ValuesFromDictFonctor, entries: list, desc: bool = True) list [source]#
Implementation details of
sort_by()
.- Parameters:
functor (ValuesFromDictFonctor) – The functor allowing to extract the values used to sort the entries.
entries (list) – A list of minifold entries.
desc (bool) – Pass
True
to sort by ascending order,False
otherwise.
- Returns:
The sorted entries, with respect to
functor
.
minifold.strings module#
- remove_html_escape_sequences(s: str) str [source]#
Removes the HTML sequences.
Example
>>> remove_html_escape_sequences("chaîne") 'chane'
- Parameters:
s (str) – The input string.
- Returns:
The converted string.
- remove_html_tags(s: str) str [source]#
Removes the HTML tag from an HTML string.
Example
>>> remove_html_tags("<a href='#'>A link</a> <u><i>italic underlined</i><u>") 'A link bold italic underlined'
- Parameters:
s (str) – The input string.
- Returns:
The converted string.
- remove_latex_escape_sequence(s: str) str [source]#
Removes the latex sequences.
Example
>>> remove_latex_escape_sequence("n\oe ud") 'nud'
- Parameters:
s (str) – The input string.
- Returns:
The converted string.
- remove_punctuation(s: str) str [source]#
Replaces the punctuation characters by spaces.
Example
>>> remove_punctuation("Example: a sentence, with punctuation." 'Example a sentence with punctuation '
- Parameters:
s (str) – The input string.
- Returns:
The converted string.
- to_canonic_fullname(s: str) str [source]#
Canonizes a fullname.
- Parameters:
s (str) – The input fullname.
- Returns:
The converted fullname.
- to_canonic_string(s: str) str [source]#
Canonizes a string.
- Parameters:
s (str) – The input string.
- Returns:
The converted string.
- to_international_chr(c: str) str [source]#
Converts international characters to the corresponding character(s) in
a-zA-Z
.Examples
>>> to_international_chr("ß") 'ss' >>> to_international_chr("é") 'e'
- Parameters:
c (str) – The input character.
- Returns:
The corresponding characters(s).
minifold.twitter module#
- class TwitterConnector(twitter_id: str, consumer_key: str, consumer_secret: str, access_token: str, access_token_secret: str)[source]#
Bases:
Connector
The
TwitterConnector
is a gateway minifold allowing to fetch tweets from Twitter.Constructor.
- Parameters:
twitter_id (str) – The Twitter identifier.
consumer_key (str) – The Twitter API Key.
consumer_secret (str) – The Twitter secret.
access_token (str) – The Twitter access token.
access_token_secret – The Twitter access token secret.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
WhereConnector
instance.- Parameters:
object (str) – The name of the collection. Valid collections are
"self"
and"feed"
.- Returns:
The set of corresponding attributes.
minifold.union module#
- class UnionConnector(children: list)[source]#
Bases:
Connector
The
UnionConnector
class implements the UNION statement in a minifold pipeline.Constructor.
- Parameters:
child (Connector) – The list of children minifold
Connector
instances.
minifold.unique module#
- class UniqueConnector(attributes: list, child)[source]#
Bases:
Connector
The
UniqueConnector
class implements the GROUP BY statement in a minifold pipeline.Constructor.
- Parameters:
attributes (list) – The list of entry keys used to form the aggregates.
child (Connector) – The child minifold
Connector
instance.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
UniqueConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- property child#
Accessor to the child minifold
Connector
instance.- Returns:
The child minifold
Connector
instance.
- unique(attributes: list, entries: list) list [source]#
Implements the UNIQUE statement for a list of minifold entries.
- Parameters:
attributes (list) – The list of entry keys used to determine the uniqueness.
entries (list) – A list of minifold entries.
- Returns:
The remaining entries once the UNIQUE filtering has been applied.
- unique_impl(functor: ValuesFromDictFonctor, entries: list) list [source]#
Implementation details of
unique()
.- Parameters:
functor (ValuesFromDictFonctor) – The functor allowing to extract the values used to form the aggregates.
entries (list) – A list of minifold entries.
- Returns:
A dictionary where each entries are unique with respect to
functor
.
minifold.unnest module#
- class UnnestConnector(map_key_unnestedkey: dict, child: Connector)[source]#
Bases:
Connector
The
UnnestConnector
class implements the unnest function (in PostgreSQL) in a minifold pipeline.Constructor.
- Parameters:
child (Connector) – The child minifold
Connector
instance.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
UnnestConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- property child: Connector#
Accessor to the child minifold
Connector
instance.- Returns:
The child minifold
Connector
instance.
- unnest(map_key_unnestedkey: dict, entries: list) list [source]#
Implements the
unnest
PostgresSQL function for a list of minifold entries.Example
>>> unnest({"a": "A", "b": "B"}, [{"a": [1, 2, 3], "b": [10, 20, 30]}]) [{'A': 1}, {'A': 2}, {'A': 3}, {'B': 10}, {'B': 20}, {'B': 30}]
- Parameters:
map_key_unnestedkey (dict) – A dictionary which maps key corresponding to its corresponding unnested key.
entries (list) – A list of minifold entries.
- Returns:
The list unnested entries. Each entry is a dictionary with a single key-value pair, where the key is an unnested key.
minifold.values_from_dict module#
- class ValuesFromDictFonctor(attributes: list)[source]#
Bases:
object
The
ValuesFromDictFonctor
class is an internal minifold functor used to extract a subset of values from a dictionary.It is used by several minifold
Connector
classes, including:GroupByConnector
;SortByConnector
;UniqueConnector
.
Constructor.
- Parameters:
attributes (list) – The keys of the values to be extracted.
- property attributes: list#
Retrieves the keys of interest related to this
ValuesFromDictFonctor
instance.- Returns:
The attributes of interest related to this
ValuesFromDictFonctor
instance.
minifold.where module#
- class WhereConnector(child: Connector, keep_if: callable)[source]#
Bases:
Connector
The
WhereConnector
class implements the WHERE statement in a minifold pipeline.Constructor.
- Parameters:
child (Connector) – The child minifold
Connector
instance.keep_if (callable) – A function such that
f(entry)
returnsTrue
ifentry
must be kept,False
otherwise.
- attributes(object: str) set [source]#
Lists the available attributes related to a given collection of minifold entries exposed by this
WhereConnector
instance.- Parameters:
object (str) – The name of the collection.
- Returns:
The set of corresponding attributes.
- property child#
Accessor to the child minifold
Connector
instance.- Returns:
The child minifold
Connector
instance.
- property keep_if#
Accessor to the filtering function used by this
WhereConnector
instance.- Returns:
The child minifold
Connector
instance.
- where(entries: list, f: callable) list [source]#
Implements the WHERE statement for a list of minifold entries.
Example
>>> where([{"a": 1, "b": 1}, {"a": 2, "b": 2}, {"a": 3, "b": 3}], lambda e: e["a"] <= 2) [{'a': 1, 'b': 1}, {'a': 2, 'b': 2}]
- Parameters:
entries (list) – A list of minifold entries.
f (callable) – A function such that
f(entry)
returnsTrue
ifentry
must be kept,False
otherwise.
- Returns:
The kept entries.
Module contents#
Top-level package.