package Lingua::Norms::USF;
use 5.10.0;
use Carp qw(carp croak);
use feature qw(say switch);
use File::Slurp;
use File::Spec;
use Text::CSV::Separator qw(get_separator);
use Scalar::Util qw(looks_like_number);
use String::Util qw(crunch hascontent nocontent repeat trim);
use List::MoreUtils qw{any firstidx mesh none uniq};
use warnings;
use strict;
use File::Basename;# basename, dirname
use lib ( File::Spec->catdir( dirname(__FILE__), 'USF' ),  File::Spec->catdir( dirname(__FILE__), 'USF', 'excludes') );

=pod

=head1 NAME

Lingua::Norms::USF - search the University of South Florida word association norms

=head1 VERSION

Version 0.01

=cut

our $VERSION = '0.01';

=head1 SYNOPSIS

 use Lingua::Norms::USF;
 $usf = Lingua::Norms::USF->new();
 $val = $usf->frequency($word); # or any of 10 word statistics (incl. setsize, concreteness, connectivity & orthographic neighbors)
 $vals = $usf->word_stat($word, ['FRQ', 'HMG']); # get more than one stat directly in a single call
 $vals = $usf->forward_strength($w1exp, $w2exp); # or any of 9 word-pair statistics (incl. mediator strength, net strength
 $vals = $usf->assoc_stat($w1exp, $w2exp, ['FSG', 'BSG']); # get more than one association-stat in a single call
 $val = $usf->net_strength($w1exp, $w2exp); # one of several statistics calculated in realtime
 $are_related = $usf->are_related($w1exp, $w2exp); # alias for 'assoc_depth', ret. undef (no $w1exp), 0 (no assoc), 1 or >1 for depth of association
 $first_associate = $usf->find_associate($word, depth => 1); # return the word (and maybe its data) that is 1st (or n'th) related to $word
 $associates_ref = $usf->list_associates($word); # similar methods for listing cues, orthographic neighbours, mediators and overlaps
 $random_word = $usf->random_word(); # customizable for looking up cues only (or associates also), first letter, and how many random words to return

=head1 DESCRIPTION

This module provides an interface to the database of word associations produced by Douglas Nelson and associates at the University of South Florida (Nelson et al., 2004) (see L</"REFERENCES">). Documentation for this database itself - how the words were normed, the sample-sizes, the meaning of its statistics, etc. - as well as related databases can be obtained from L<http://w3.usf.edu/FreeAssociation/>. 

This module requires that the USF norms are accessible as a .csv database within a local directory, typically under this module itself, in a directory named 'db'. The files have firstly to be fetched from the USF-norms website, and then given some slight editing. See the install script.

=head1 METHODS

=head2 new

 $usf = Lingua::Norms::USF->new(dbdir => 'files/are/in/here/');

Initialises and returns the class object. One named parameter should be passed: B<dbdir>, a path to the directory containing the 'Cue-to-Target' files. Attempts to identify the path to the USF norms cue database. This should be a readable directory containing at least 26 files, one named by each letter of the alphabet, and which are simple textfiles suffixed with ".csv" and having commas as separators. Tests are run to ensure all these conditions are met.

If no value for the B<dbdir> is given, then the enviroment variable B<USFNORMS_DB> is tested. If this is not defined, or does not pass the tests, and there are values for B<dbdir>, these are tested, and, if these fail, then there's a search for a folder named "db" under a "Lingua/Norms/USF/" path, itself under the Perl install path (as given by the L<Config|Config::installsitelib> module's B<installsitelib>), or (last and least) the user's "my documents" folder.

If any of these tests are successful, the full path is returned; otherwise the method returns C<undef>. Assuming that the latter means there is no local store of the DB, download of the DB is attempted from the USF site: L<http://w3.usf.edu/FreeAssociation/AppendixA/>. This URL indexes files that list the norms for each cue, sorted alphabetically, and separated into 8 files of about 1.1 MB each, each for one or more first-letters of the cues, e.g., A-B, C, D-F. In order to be used by this module, the files need some basic transformation, including HTML-tag stripping, trimming of whitespace, and organization as 26 A-Z .csv files, one for each letter of the alphabet. Note that, once downloaded, the files take about 7-8 minutes to be tansformed into a trim database. This leaves behind a set of CSV textfiles, and the interface (at present) uses the most basic means to access these: by direct file-reading and line-splitting. The method prints info to STDOUT about the different steps being taken in downloading and installing.

Key lists to be used for various other methods are also slurped into the class object; see L</"Excluding words and word-pairs">. This is based on files that should be in the module's "USF/excludes" folder. If these files aren't present, you can install them directly from the cpan.org distribution.

=cut

sub new {
	my ($class, %args) = @_;
	my $self = {};
	bless $self, $class;
    
    # Attempt to identify the DB location, or download it:
    require 'install_norms_usf.pl';
    _run_install() if !($self->{'dbdir'} = _identify_db(dbdir => delete $args{'dbdir'})); # find/set path to database
    croak __PACKAGE__, ' Could not determine the database location, or download and install it' if ! $self->{'dbdir'};
    _init_ex_words($self, \%args); # init lists of fields and words/associations to ignore
    
    $self->{'_field_aref'} = [qw/FSG BSG MSG OSG MCT/, '', qw/OCT/, '', qw/SSZ FRQ CCN HMG PSP CNM RSM RSS/];
    $self->{'_first_letter_w'} = [qw/0.11602 0.04702 0.03511 0.0267 0.02 0.03779 0.0195 0.07232 0.06286 0.00631 0.0069 0.02705 0.04374 0.02365 0.06264 0.02545 0.00173 0.01653 0.07755 0.16671 0.01487 0.00619 0.06661 0.00005 0.0162 0.0005/];
	srand();
	return $self;
}

=head2 Statistics for single words

These methods return a numerical or string value corresponding to a statistic for a single word. Most of the statistics are already pre-calculated from the data, and the method simply isolates and returns them. These are the measures of:

 SSZ Set-size
 CNM Connectivity-mean
 RSM Resonance-mean
 RSS Resonance-strength

Additionally, the database provides the following for each word it includes, based on other sources:

 FRQ Frequency
 CCN Concreteness
 HMG Homograph
 PSP Part-of-speech

The module also provides some statistics that are calculated upon each call, that are not actually listed in the DB. Naturally, these will take a little while longer to retrieve than the former statistics. Presently, these are:

 CCT Count of cues
 ONC Count of orthographic neighbours

Each method firstly looks for the word as a cue. If it is not found, it is looked up among the associates to the cue. If the word is still not found, the method returns C<undef>. Then, even if the word is found, there might be no value for the requested statistic - perhaps the word hasn't been given a frequency or concreteness rating, or it did not appear as a cue and so cannot have a set-size, etc. Each method returns C<undef> if this is the case.

=head3 setsize

 $val = $usf->setsize($word); # returns no. of forward associates to $word, as listed in the DB
 $val = $usf->setsize($word, calc => 1); # calculates and returns no. of forward associates to $word, from the total list of all associates

I<Alias>: C<ssz>, C<assoc_count>, C<act>, C<set_size>

Returns, as listed, the word's set-size. This is a measure of how many strong associates the cue has, based on the number of associations given by two or more participants in the norming sample. 

The values of set-size are listed in the USF files. However, the statistic can also be calculated upon calling this method by sending B<calc> => 1. This is simply the number of elements returned by L<list_associates|list_associates>. This number is usually the same as listed in the database, but it can be larger given that it is not restricted to only "strong associates" (given by at least 2 respondents). Also, if you have instantiated the class object with one of the "L<exclusion|Excluding words and word-pairs>" files, or done the same via L<set_exclusions|set_exclusions>, then the results might also be I<lower> than listed (for a very tiny proportion of cues). For example, with all default conditions, the method returns - even where B<calc> => 1 - 10 for the number of associates to BOY; but when the file C<ex_cultural.csv> has been loaded, there are only 9 associates returned by this method; when B<calc> => 1, the association BOY->GEORGE (listed in C<ex_cultural.csv>) is excluded; but the method with B<calc> => 0 (the default) will still return 10, as this is what is listed in the database itself.

The method returns C<undef> if the word is not listed as a cue in the database.

=cut

sub setsize {
    my $val = word_stat(shift, shift, 'SSZ', @_);
    return looks_like_number($val) ? $val : undef;
}
*ssz = \&setsize;
*assoc_count = \&setsize;
*act = \&setsize;
*set_size = \&setsize;

=head3 connectivity

 $val = $usf->connectivity($word);

I<Alias>: C<cnm>

Returns, as listed, the mean connectivity, which describes the breadth of association among the strongest associates of the target. It is obtained by counting the number of connections among the associates in the set, and dividing by the size of the set (minus any potential L<mediators|mediator_strength>). So MILK has associates COW, DRINK, WHITE, et al., and the number of associates cued by each of COW, DRINK, WHITE, et al., are summed, and then the sum is divided by the number of MILK's associates, so expressing MILK's associative connectivity.

=cut

sub connectivity {
    my $val = word_stat(shift, shift, 'CNM', @_);
    return looks_like_number($val) ? $val : undef;
}
*cnm = \&connectivity;

=head3 resonance_mean

 $val = $usf->resonance_mean($word);

I<Alias>: C<rsm>

Returns, as listed, the probability that each associate in the word's set itself produces the (normed) cue as an associate, based on the number of associates in the set that produce the cue as an associate, and dividing by set-size (minus any potential L<mediators|mediator_strength>); a resonant connection between a word and one of its associates means there's a connection in both directions.

=cut

sub resonance_mean {
    my $val = word_stat(shift, shift, 'RSM', @_);
    return looks_like_number($val) ? $val : undef;
}
*rsm = \&resonance_mean;

=head3 resonance_strength

 $val = $usf->resonance_strength($word);

I<Alias>: C<rss>

Returns, as listed, the resonance strength, which is based on cross-multiplying forward and backward associative strengths (i.e., cue-to-associate strength by associate-to-cue strength) for each associate in the word's set, and then summing the result.

=cut

sub resonance_strength {
    my $val = word_stat(shift, shift, 'RSS', @_);
    return looks_like_number($val) ? $val : undef;
}
*rss = \&resonance_strength;

=head3 cue_count

 $val = $usf->cue_count('word');

I<Alias>: cct

Returns, as calculated, the total number of forward associates I<to> the given word. This is the number of elements returned by L<list_cues|list_cues>. The method returns C<undef> if the word is not listed as an associate (or "target"); or it returns a value greater-than zero (0) if, being listed, it is given as an associate to as many cues.

=cut

sub cue_count {
    my $val = word_stat(shift, shift, 'CCT', @_);
    return looks_like_number($val) ? $val : undef;
}
*cct = \&cue_count;
*csz = \&cue_count;

=head3 orthon_count

 $val = $usf->orthon_count('word');
 $val = $usf->orthon_count('word', cues_only => BOOL);

I<Alias>: onc

Returns, as calculated, the number of orthographic neighbors of a given word, as defined in L<list_orthons|list_orthons>. Accepts the argument B<cues_only> and others as defined in L<list_orthons|list_orthons>.

=cut

sub orthon_count {
    my $val = word_stat(shift, shift, 'ONC', @_);
    return looks_like_number($val) ? $val : undef;
}
*onc = \&orthon_count;

=head3 frequency

 $val = $usf->frequency($word);

I<Alias>: C<frq>

Returns the print-frequency of the word. This is derived from the Kucera & Francis (1967) norms included in the USF database.

=cut

sub frequency {
    my $val = word_stat(shift, shift, 'FRQ', @_);
    return looks_like_number($val) ? $val : undef;
}
*frq = \&frequency;

=head3 concreteness

 $val = $usf->concreteness($word);

I<Alias>: C<ccn>

Returns the word's rating for concreteness on a scale of 1 (low, abstract) to 7 (high, concrete). The ratings are as listed in the USF database, which, if not based on original USF research, are derived from Paivio et al.(1968) and Toglia and Battig (1978) norms.

=cut

sub concreteness {
    my $val = word_stat(shift, shift, 'CCN', @_);
    return looks_like_number($val) ? $val : undef;
}
*ccn = \&concreteness;

=head3 homograph

 $val = $usf->homograph($word);

I<Alias>: C<hmg>

Returns 1 if the word is a homograph: more than one meaning indicated by the look (not merely the sound) of a single string of letters, such as BANK or PALM. This is derived from several sources, including Twilley et al. (1994). If the word has not been scored as a homograph in these sources, the method returns 0. Note that this is no guarantee that the word is not a homograph; only that it has not been scored as such. Also, this does not concern homophones (i.e., words like SAIL and SALE that sound the same but mean different things).

=cut

sub homograph {
    #my $val = word_stat(@_, 'HMG');
    #return hascontent($val) and $val =~ /^[A-Z]$/ ? 1 : undef;
    return hascontent(word_stat(shift, shift, 'HMG', @_)); # as boolean (no undef)
}
*hmg = \&homograph;

=head3 partofspeech

 $bool = $usf->partofspeech($word);

I<Alias>: C<psp>

If the word has been categorized for its part-of-speech, the method returns the following indicative values; otherwise it returns C<undef>. The categorization is the part-of-speech of the first meaning of the word according to the I<The American Heritage Dictionary of the English Language>. Each part-of-speech is indicated by a 2-letter code (slightly different to what is in the raw DB files):

 NN Noun
 VB Verb
 AJ Adjective
 AV Adverb
 PN Pronoun
 PP Preposition
 CJ Conjunction
 IJ Interjection

(Note that ADJ and PRP also occur in the raw DB files, but these are translated to the more common AJ and PP upon installation. Single letters codes have all been extended.)

=cut

sub partofspeech {
    my $val = word_stat(shift, shift, 'PSP', @_);
    if (hascontent($val) and $val =~ /^[A-Z]+$/) {
        return $val;
    }
    else {
        return undef;
    }
}
*psp = \&partofspeech;

=head3 word_stat

 $val = $usf->word_stat('aword', 'FRQ'); # get a single statistic
 @ari = $usf->word_stat('aword', [qw/FRQ CCN/]); # get an open flat list of statistics (default if n. requested stats > 1)
 @ari = $usf->word_stat('aword', [qw/FRQ CCN/], {ref => 0}); # same as above
 $aref = $usf->word_stat('aword', [qw/FRQ CCN/], {ref => 1}); # get a referenced flat list of statistics
 $href = $usf->word_stat('aword', [qw/FRQ CCN/], {ref => 2}); # get a referenced hash of statistics

The above methods calling particular statistics are simply wrappers to this method. Here, you give the word to look up, and then specify the statistic required, using the 3-letter upper-case codes given above. You can get all the statistics you want for a single word in one call by sending a referenced list of statistics - e.g., [qw/FRQ CCN/]. You'll get back a flat list (referenced or not) with values in the order requested, or a hash-ref of values, the keys being the stats you requested. Specify what you want returned by giving a value for B<ref> as the third (named) argument, as shown above. This does not actually have to be a I<referenced> argument list (at present).

For some stats, the returned values here are more specific than what you get from calling it by its own method-name (this might change). Unlike calling each specific statistical method (L<frequency|frequency>, etc.), this method will not necessarily return C<undef> if there is no statistical value; it will return the blank character or whatever else might be in that field in the database. This also means you can get the exact letter-code for homographs as they appear in the database; i.e., you get G, T or other for the source of rating the word as a homograph (not just 1 if it has been rated by any source), and C<undef> if there is no rating.

If any stat-code given does not correspond to a valid code, the method L<carp:Carp::carp>s and dies.

=cut

sub word_stat {
    my ($self, $word, $fields, @ari) = @_;
    _return_stat_values($self, $fields, _get_single_entry($self, $word), _set_args(@ari));
}   

=head3 is_normed

 $bool = $usf->is_normed($word);

I<Alias>: is_cue

Returns 1 or zero to indicate if the word has served as a cue, having at least one associate.

=cut

sub is_normed {
    return list_associates(shift, shift, {depth => 1, data => 0}) ? 1 : 0;
}
*is_cue = \&is_normed;

=head3 is_word

 $bool = $usf->is_word($word);

I<Alias>: is_listed

Returns 1 or zero to indicate if the word has served as a cue I<or> a target. This might be seen as function to determine if a string is a real word - at least one likely to be used or cued in free word association tests. As this is not comprehensive, it is probably best referred to as "is_listed".

=cut

sub is_word {
    return _get_single_entry(shift, shift) ? 1 : 0;
}

=head2 Statistics for word-pairs

These methods return statistics corresponding to the association between two specific words. The database provides the following for each association it includes:

 FSG Forward strength
 BSG Backward strength
 MSG Mediator strength
 OSG Overlap strength
 MCT Count of mediators
 OCT Count of overlaps (shared associates)

Additionally the module calculates 

 NSG Net strength
 MCX Count of mediators, unrelated
 OCX Count of overlaps (shared associates), unrelated
 ASD Association depth ("related or not")

There are some other statistics listed in the DB that are not accessible here, such as the size of the sample giving these associations; these can be got by getting the entire row in the database for an association, using L<assoc_data|assoc_data>.

=head3 forward_strength

 $val = $usf->forward_strength('word1', 'word2');

I<Alias>: fsg

Returns, as listed, the forward-strength between two words - i.e., the probability that the second word will be given as an associate of the first word. If there is an entry for the first word, but it is not associated with the second word the method returns 0. If there is no entry for the first word, the method returns C<undef>.

=cut

sub forward_strength { return assoc_stat(shift, shift, shift, 'FSG', @_);} *fsg = \&forward_strength;

=head3 backward_strength

 $val = $usf->backward_strength('word1', 'word2');

I<Alias>: bsg

Returns, as listed, the backward-strength between two words - i.e., the probability that the first word will be given as an associate of the second word. If the words are not backward associated, the method returns 0; and C<undef> is returned if the first word is not listed as a cue.

=cut

sub backward_strength {
    my ($self, $w1exp, $w2exp) = @_;
    my $val = assoc_stat($self, $w1exp, $w2exp, 'BSG');
    return undef if ! defined $val;
    $val = assoc_stat($self, $w2exp, $w1exp, 'FSG') if ! $val; # check if they are associated in reverse, even if not having a forward assoc
    return $val;
}
*bsg = \&backward_strength;

=head3 mediator_strength

 $val = $usf->mediator_strength('word1', 'word2');

I<Alias>: msg

Returns, as listed, the mediated strength between two words, calculated by cross-multiplying the forward-strength the first word has for any other words that cue the second word, and summing the results across each such link. A cue and one of its associates have a mediator if any other word associated with the cue itself cues that associate. So DOG directly cues CAT, but DOG also has a mediator to CAT in that DOG also cues ANIMAL, which itself cues CAT.

If the words are not associated, or the first word has no association with other words that cue the second word, the method returns 0; and C<undef> is returned if the first word is not listed as a cue. Note that this statistic does not include "mediators" unless the two words are themselves directly related (in the order given); e.g., the fact that CAT can cue CHEESE by way of MOUSE"is ignored as CAT and CHEESE do not themselves have a forward association.

=cut

sub mediator_strength { return assoc_stat(@_, 'MSG');} *msg = \&mediator_strength;

=head3 overlap_strength

 $val = $usf->overlap_strength('word1', 'word2');

I<Alias>: osg

Returns, as listed, the overlap strength between two words, calculated by cross-multiplying and then summing the forward-strengths that the two words, if they are associated themselves, both have to any other words. If the words are not associated, or they have no words to which they are both associated, the method returns 0; and C<undef> is returned if the first word is not listed as a cue. Note that this statistic does not include "overlaps" unless the two words are themselves directly related (in the order given); e.g., the fact that CLOWN and GAME both cue FUN is ignored as CLOWN and GAME do not themselves have a forward association.

=cut

sub overlap_strength { return assoc_stat(@_, 'OSG');} *osg = \&overlap_strength;

=head3 net_strength

 $val = $usf->net_strength('word1', 'word2', mweight => 1);

I<Alias>: nsg

Returns, as calculated, the net associative strength between two words. This is the measure of "net pre-existing strength" defined by Douglas et al. (1997); see L<REFERENCES|/"REFERENCES">. It measures the association between two words on the basis of both the forward and backward associative strengths between them, and all the L<mediator strengths|mediator_strength> between them, i.e., the forward-strengths between (i) word-1 and words it cues that are associated to word-2, and (ii) word-2 and words it cues that are associated to word-1. A multiplicative weight can be applied to the sum of these "indirect" mediator strengths; the default value of this B<mweight> is 1.

=cut

sub net_strength { return assoc_stat(shift, shift, shift, 'NSG', @_);} *nsg = \&net_strength;

=head3 transitional_strength

 $val =  $usf->transitional_strength('word1', 'word2');

I<Alias>: tsg

This is the probability of a transition between the two words, defined as their forward associative strength divided by the product of this strength and word2's set-size.

=cut

sub transitional_strength { return assoc_stat(shift, shift, shift, 'TSG', @_);} *tsg = \&transitional_strength;

=head3 mediator_count

 $val = $usf->mediator_count('word1', 'word2');

I<Alias>: mct

Returns, as listed, the number of mediated connections linking the first and second words - i.e., where the first word cues a word that itself cues the second word - should the first and second words have a forward association. Note that this value is as listed in the USF database, and only involves mediators where the two words are themselves directly associated. Use L<mediator_count_x|mediator_count_x> for all mediators regardless of the association between the first and second words.

=cut

sub mediator_count { return assoc_stat(@_, 'MCT');} *mct = \&mediator_count;

=head3 mediator_count_x

 $val = $usf->mediator_count_x('word1', 'word2');

I<Alias>: mcx

Returns, as calculated, the number of associates of the first word that cue the second word, I<regardless of whether or not the first and second words are themselves directly associated>. For example, GRAPH and WORLD are not directly associated (L<are_related|assoc_depth (are_associated), are_related> returns 0), but MAP is cued by GRAPH and itself cues WORLD. This is the type of "mediated association" studied, for example, by McNamara (1992). It is simply the number of elements returned by L<list_mediators|list_mediators>. If you only want the count where the two words I<are> associated, use L<mediator_count|mediator_count>, which is listed in the USF database itself.

=cut

sub mediator_count_x { return assoc_stat(shift, shift, shift, 'MCX', @_);} *mcx = \&mediator_count_x;

=head3 overlap_count

 $val = $usf->overlap_count('word1', 'word2');

I<Aliases>: oct, shared_count, sct

Returns, as listed, the number of associates shared by the first and second words - i.e., the number of words that both the first and second words cue - should the first and second words have a forward association. Note that this value is as listed in the USF database, and only involves shared associates where the two words are themselves directly associated. Use L<overlap_count_x|overlap_count_x> for all shared associates regardless of the association between the first and second words.

=cut

sub overlap_count { return assoc_stat(@_, 'OCT');} *oct = \&overlap_count; *shared_count = \&overlap_count; *sct = \&overlap_count;

=head3 overlap_count_x

 $val = $usf->overlap_count_x('word1', 'word2');

I<Alias>: ocx, shared_count_x, sctx

Returns, as calculated, the number of associates that are to the first word and second word, I<regardless of whether or not the first and second words are themselves directly associated>. For example, GRAPH and WORLD are not directly associated (L<are_related|assoc_depth (are_associated), are_related> returns 0), but MAP is cued by both GRAPH and WORLD. This is simply the number of elements returned by L<list_overlaps|list_overlaps>. If you only want the count where the two words I<are> associated, use L<overlap_count|overlap_count>, which is listed in the USF database itself.

=cut

sub overlap_count_x { return assoc_stat(shift, shift, shift, 'OCX', @_);} *ocx = \&overlap_count_x; *shared_count_x = \&overlap_count_x; *sctx = \&overlap_count_x;

=head3 orthon_level (are_orthons)

 $bool = $usf->orthon_level('word1', 'word2'); # simple single-substitution "Coltheart" orthon
 $bool = $usf->orthon_level('word1', 'word2', list => $aref); # same, but with the list provided (option to be deprecated!)
 $val = $usf->orthon_level('word1', 'word2', stat => 'levd'); # level proper by Levenshtein statistic
 $val = $usf->orthon_level('word1', 'word2', stat => 'levd', weights => [0, 1, 1]); # same, with equality, addi/deletion and subst. weights

I<Aliases>: orthon_level, onl

Returns 1 or 0 to indicate, respectively, if the two words sent are orthographic neighbors or not - a boolean as to whether they a simple single-substitution ("Coltheart") orthons; see <list_orthons|list_orthons/Substitution (Coltheart) orthons>. In the subsequent hash array of arguments, an array reference of (letter-)strings to check for the presence of the second word can be sent. In this case, the method is simply an "any"-type function. This is simply a convenience if you are trying to find if any number of words are among the first word's orthons - so you would get the L<list of orthons|list_orthons> first, and send it to this method as an array reference. (This option is unlikely to survive early versions - see L</TODO>.)

If the value of B<stat> is equal to "levd", the Levenshtein distance is between the two words is returned. This can be qualified by a further argument: B<weights>, which specify, in a 3-element referenced flat array, the "costs" for (respectively) equality, addition/deletion and substitution.  So the list [0, 1, 2] prescribes a cost of 2 units for an inequality, requiring substitution, and 1 unit for addition/deletion. The default is [0, 1, 1] - in which case the B<weight> argument might as well be neglected, as these are the default weights for the Levenshtein distance. The modules L<Text::Levenshtein|Text::Levenshtein> and L<Text::WagnerFischer|Text::WagnerFischer> perform the calculations.

If the value of B<stat> is equal to "mui," the Myers-Ukkonen edit index is used to calculate orthographic similarity, as calculated by the module L<String::Similarity|String::Similarity>. 

=cut

sub orthon_level {
    my ($self, $w1, $w2, @ari) = @_;
    my $args = _set_args(@ari);
    my $val;
    if ($args->{'list'}) {
        $val = any { $_ eq $w2} @{$args->{'list'}};
    }
    else {
        if ($args->{'stat'} and $args->{'stat'} ne 'single_subst') {
            $val = _calc_orthon_level($w1, $w2, $args->{'stat'}, $args->{'weights'});
        }
        else { 
            $val = any { $_ eq $w2} $self->list_orthons($w1);
        }
    }
    return $val;
}
*are_orthons = \&orthon_level;
*onl = \&orthon_level;

=head3 assoc_depth (are_associated, are_related)

 $val = $usf->assoc_depth('word1', 'word2');

I<Aliases>: are_related, are_associated, related, associated

Returns, as calculated, the depth at which a forward association is found between two words, or 0 if they have no forward association, or undef if the first word itself was not found. For example, 'cookie' is the 5th most common  associate of 'milk', so the method returns '5'. The method can simply be used to test if two words are, indeed, directly related. If L<exclusions|Excluding words and word-pairs> have been loaded, then any pairs thus excluded will not figure in the count of associative depth. For example, MAN is normed as the 4th associate of BOY but will appear as the 3rd associate if the "ex_cultural" list is loaded, given that BOY->GEORGE - normed as the 3rd most common association - is excluded on this basis.

=cut

sub assoc_depth { return assoc_stat(shift, shift, shift, 'ASD', @_);} *are_related = \&assoc_depth; *related = \&assoc_depth; *are_associated = \&assoc_depth; *associated = \&assoc_depth;

=head3 assoc_stat

 $val = $usf->assoc_stat('word1', 'word2', 'FSG'); # get a single value, see above for other stat codes, optionally provide 'data' arg as below
 @vals = $usf->assoc_stat('word1', 'word2', [qw/FSG BSG/]); # get a flat list of values, assumes 'ref => 0'
 $aref = $usf->assoc_stat('word1', 'word2', [qw/FSG BSG/], {ref => 1}); # get a referenced flat list of values (one or more)
 $href = $usf->assoc_stat('word1', 'word2', [qw/FSG BSG/], {ref => 2}); # get a reference to a keyed list of values (one or more)

Grab any of the above statistics for pairs of words directly, with more than one statistic, if desired, in a single call. Use the codes given above. If the code does not correspond to a valid code, the method L<carp:Carp::carp>s and dies. By default, returns a single scalar value if only one statistic has been requested, or an open array of values if more than one statistic has been requested. If the value I<ref> is set to one, a flat array reference is returned, otherwise a non-zero value for I<ref> returns a hash-ref, with each requested statistical value being named by the code with which it was requested, e.g., 'FSG => 0.02'.

=cut

sub assoc_stat {
    my ($self, $w1exp, $w2exp, $fields, @ari) = @_; # checking of words will be done in subsequent subs
    $fields = [$fields] if ! ref $fields;
    my $args = _set_args(@ari);
    my (@vals, @invals) = ();
    foreach my $field(@{$fields}) {
        given($field){
            when(/^NSG/) {
                push @vals, _calc_nsg($self, [$w1exp, $w2exp], $args);
            }
            when(/^TSG/) {
                push @vals, _calc_tsg($self, [$w1exp, $w2exp], $args);
            }
            when (/^MCX/) {
                push @vals, scalar($self->list_mediators($w1exp, $w2exp, depth => 100, data => 0, ref => 0));
            }
            when(/^OCX/) {
                push @vals, scalar($self->list_overlaps($w1exp, $w2exp, depth => 100, data => 0, ref => 0));
            }
            when(/^ASD/) {
                my $aref = _get_pair_entry($self, $w1exp, $w2exp);
                push @vals, ref $aref ? $aref->[-1] : $aref;
            }
            default {
                if ( _is_valid_fieldname($self, $_) ) {  # ensure the keys are valid
                    push @vals, _get_val_by_aref($self, _get_pair_entry($self, $w1exp, $w2exp), $_, $args); # make list of requested stat values
                }
                else {
                    push(@invals, $_);
                }
            }
        }
    }
    croak 'Invalid statistic(s): ' . join(',', @invals) if scalar(@invals);
    if (!$args->{'ref'}) { # return unreferenced list or single value
       return scalar(@vals) > 1 ? @vals : $vals[0];
    }
    elsif ($args->{'ref'} == 1) { # return flat referenced list
       return \@vals;
    }
    else { # return keyed list
       return { mesh(@{$fields}, @vals) }; # make list of key => value pairs naming each stat value (e.g., FSG => 0.20)
    }
}

=head3 assoc_data

 @vals = $usf->assoc_data('word1', 'word2'); # assumes ref => 0
 @vals = $usf->assoc_data('word1', 'word2', {ref => 0});
 $aref = $usf->assoc_data('word1', 'word2', {ref => 1});
 $href = $usf->assoc_data('word1', 'word2', {ref => 2});

Returns reference to I<all> the data for the association between two words (if one was found). This is the complete line in the database for this pair of words, split into an array. This is returned as an open array if B<ref> is undefined, or defined as 0; a flat referenced array if B<ref> equals 1; or a non-zero value for B<ref> obtains a list of key-value pairs, with each statistical value named. The final value of the list is the L<association depth|/"assoc_depth (are_associated), are_related">, keyed as B<ASD>. Returns undef if the two words were not found as forward associates.

=cut

sub assoc_data {
    my ($self, $w1exp, $w2exp, @ari) = @_; # checking of words will be done in find_pair()
    my $args = _set_args(@ari);
    my $aref = _get_pair_entry($self, $w1exp, $w2exp);
    if (ref($aref)) {
        if ($args->{'ref'}) {
            return $args->{'ref'} == 1 ? $aref : _hashify_line($aref);
        }
        else {
            return @{$aref};
        }
    }
    else {
        return undef;
    }
}

=head2 Retrieving words and word-pairs

All these "list" methods return, by default, references to lists of words or data (keyed as a hash or or not).

=head3 list_associates

 $aref = $usf->list_associates('milk'); # returns aref of all 18 associates of 'MILK', from 'COW' to 'SKIM' (assumes ref=>1, data=>0, depth=>undef)
 $aref = $usf->list_associates('milk', depth => 4, data => 0); # returns aref of the first 4 associates of MILK
 $aref = $usf->list_associates('milk', depth => 4, data => 1); # returns aref of 4 arefs, comprising the data for the first 4 associates of 'MILK'
 $aref = $usf->list_associates('milk', depth => 4, data => 2); # returns aref of 4 hash-refs, each with their values labelled
 $aref = $usf->list_associates('milk', {depth => 4, data => 2}); # as above, but a neater way of sending the args
 @ari = $usf->list_associates('milk', ref => 0); # returns open array of 'MILK''s 18 associates, from 'COW' to 'SKIM'

I<Alias>: associates

Returns a reference to a list of words that are the direct associates of the given word; e.g., for 'MILK', you get a list of 18 elements, from 'COW' to 'SKIM'.  If there are no associates to the given word, it returns a reference to an empty list.

Optionally, you can send an direct or referenced hash of arguments, with the keys B<depth> and B<data>. If B<depth> is defined and non-zero, only the associates up to this depth are returned. If B<data> is 1 (rather than zero or undefined), the list returned is one of all the array reference for each associate of the word. If B<data> is 2, you get back a I<hash>-ref, with each value keyed by the field-codes noted above for the word and association statistics.

Presently, an additional argument B<ref> can be named, with a value of 0 to return an open array rather than the default array-reference.

If the word is not listed as a cue, the method returns C<undef>.

=cut

sub list_associates {
	my ($self, $word, @ari) = @_;
    croak __PACKAGE__, ' needs a word for searching associates' if nocontent($word);
    $word = uc($word);
    my $args = _set_args(@ari);
	$args->{'depth'} = 100 if nocontent($args->{'depth'});

	my @lines = ();
	my $path = _path2table4letter($self, substr($word, 0, 1)) || return undef;

    open(F, '<' . $path) or die $! . " $path";
    SEARCH:
	while (<F>) {
       	next SEARCH if $. == 1; # ignore header row
    	chomp;
    	if (/^\Q$word\E,/ && scalar(@lines) < $args->{'depth'} ) { # is word first in line? if so, list no more than first DEPTH associates
            unless (_cull_ex_words($self, $word, _this_assoc()) ) {
                push @lines, !$args->{'data'} ? _this_assoc() : $args->{'data'} == 1 ? [split/,/] : _hashify_line([split/,/]); # scalar, flat list or hash?
            }
        }
		else {
			last SEARCH if scalar(@lines); # cue must already have been found if there are lines, but since not any more, or enough, found
		}
	}
	close(F);
	return exists $args->{'ref'} && $args->{'ref'} == 1 ? \@lines : @lines;
}
*associates = \&list_associates;

=head3 list_cues

 @ari = $usf->list_cues('milk'); # returns array of 41 words ("cues") that yield the response 'MILK'
 @ari = $usf->list_cues('milk', depth => 4, data => 0); # returns array of MILK's cues, if MILK is among the first 4 associates of that cue.
 $aref = $usf->list_cues('milk', depth => 4, data => 1); # returns array of 4 arefs of the data for cues where MILK is among its first 4 associates
 $aref = $usf->list_cues('milk', depth => 4, data => 2); # as above, but returns array of 4 hash-refs, each with their values labelled
 $aref = $usf->list_cues('milk', {depth => 4, data => 2}); # as above, but neater
 @ari = $usf->list_cues('milk', ref => 0,1,2); # returns open array (default), ref. to array of array, or of keyed data hashes (mix with above)

I<Alias>: cues

Returns (by default) a referenced array listing all the words that cue the given word, or C<undef> if there are no cues to this word. Other than getting back a list of words, you can also get the complete row (as a referenced array or field-keyed hash) that has all the USF data in it - with the B<depth> of the forward association between the found word and the searched word as the last element.

=cut

sub list_cues {
	my ($self, $w2exp, @ari) = @_;
    croak __PACKAGE__, ' needs a word for looking up its cues' if nocontent($w2exp);
    $w2exp = uc($w2exp);
    my $args = _set_args(@ari);
	$args->{'depth'} = 100 if nocontent($args->{'depth'});

	my (@lines, $letter, $w1obs, %sel) = ();
    my $last_cue = '';

    foreach $letter('A' .. 'Z') { # loop through all the database tables
        my $path = _path2table4letter($self, $letter) || return undef;
	    open F, '<' . $path or die $! . " $path for $w2exp";
	    my $fnd = 0;
    	SEARCH:
    	while (<F>) {
	    	next SEARCH if $. == 1;
            chomp;
            /^([^,]+)/; # isolate first word (cue) in line
            $w1obs = $1;
            if ($w1obs eq $last_cue) {
                $fnd++;
            }
            else {
                $fnd = 0;
                $last_cue = $w1obs;
            }
            if (/^[^,]+,\Q$w2exp\E,/ and $fnd <= $args->{'depth'}) { # is the target word the second word from the start in this line?
                unless (_cull_ex_words($self, $w1obs, $w2exp) ) {
                    push @lines, ! $args->{'data'} ? $w1obs : $args->{'data'} == 1 ? [split(/,/), $fnd + 1] : _hashify_line([split(/,/), $fnd + 1]);
                    #$sel{$w1obs} = ! $args->{'data'} ? 1 : $args->{'data'} == 1 ? [split/,/] : _hashify_line([split/,/]);
                }
		    }
            next SEARCH;
        }
        close (F);
     }
     #_return_list(\%sel, $args->{'data'}, $args->{'ref'}, $args->{'sortby'});
     return exists $args->{'ref'} && $args->{'ref'} == 1 ? \@lines : @lines;
}
*cues = \&list_cues;

=head3 list_mediators

 $aref = $usf->list_mediators('word1', 'word2'); # returns aref of words cued by word1 that cue word2
 $aref = $usf->list_mediators('word1', 'word2', depth => 4, data => 0); # as above, but only up to the first 4 cues of word1
 $aref = $usf->list_mediators('word1', 'word2', depth => 4, data => 1); # as above, but aref of 4 arefs of data for word1's assoc. with word2's cue
 $aref = $usf->list_mediators('word1', 'word2', depth => 4, data => 2); # as above, but returns aref of 4 hash-refs, each with their values labelled
 $aref = $usf->list_mediators('word1', 'word2', {depth => 4, data => 2}); # as above, but a neater way of sending the args
 @ari = $usf->list_mediators('word1', 'word2', ref => 0); # returns open array of of words cued by word1 that cue word2

I<Alias>: mediators

Returns a reference to a list that contains all the associates of the first word that cue the second word; so between LOVE and HATE, there is also LIKE, CARE and ADORE - cued by LOVE, themselves cueing HATE. Note that the two words (here, LOVE and HATE) do not have to be themselves associated (unlike the basis of the L<mediator_strength|mediator_strength> and L<mediator_count|mediator_count> statistics). If that is what you want, first check that the two words L<are_related|assoc_depth (are_associated), are_related>.

You can specify that the mediators must be at least the I<n>th associate of the I<first> word by setting this as the B<depth> value. For example, if B<depth> is specified as 6, only LIKE and CARE are returned, for ADORE is only the 8th associate of LOVE. CARE is the 6th associate of LOVE, so will also be knocked out if B<depth> is less than 6. You might also want to control the associative depth between the first word's associates and the second word, so between LIKE and HATE, for instance. This can be done by setting the argument (peculiar to this method) named B<depth_med>. Now HATE is the 12th associate of CARE, so it will only appear in the list of mediators between LOVE and HAVE if B<depth_med> is less than or equal to 12.

You can get back a flat list of the mediators themselves (the default), the lines of data for each mediator, or keyed lines of data for each mediator, by setting B<data> as either 0, 1 or 2, respectively. The final value in a list of data for each mediator is the associative depth between the mediator and the second word.

=cut

sub list_mediators {
    my $self = shift;
    my ($w1exp, $w2exp) = _set_word_pairs(shift, shift);
    my $args = _set_args(@_);
	$args->{'depth'} = 100 if nocontent($args->{'depth'});
    $args->{'depth_med'} = 100 if nocontent($args->{'depth_med'});

    my $aref = $self->list_associates($w1exp, depth => $args->{'depth'}, data => 0, ref => 1); # get all the associates of word1
    # lookup each associate and see if it cues word2:
    my ($w2obs, $depth_med, @lines) = ();
    foreach $w2obs(@{$aref}) {
        if ($depth_med = $self->assoc_depth($w2obs, $w2exp)) { # are the associate and word2 related?
            if ($depth_med <= $args->{'depth_med'}) {
                push @lines, ! $args->{'data'} ? $w2obs : $args->{'data'} == 1 ? [split(/,\s*/), $depth_med] : _hashify_line([split(/,\s*/), $depth_med]);
            }
        }
    }
	return exists $args->{'ref'} && $args->{'ref'} == 1 ? \@lines : @lines;
}
*mediators = \&list_mediators;

=head3 list_overlaps

 $aref = $usf->list_overlaps('word1', 'word2'); # returns aref of words cued by both word1 and word2
 $aref = $usf->list_overlaps('word1', 'word2', depth => 4, data => 0); # as above, but only up to the first 4 cues of word1
 $aref = $usf->list_overlaps('word1', 'word2', depth => 4, data => 1); # as above, but aref of 4 arefs of data for word1's assoc. with shared cue
 $aref = $usf->list_overlaps('word1', 'word2', depth => 4, data => 2); # as above, but returns aref of 4 hash-refs, each with their values labelled
 $aref = $usf->list_overlaps('word1', 'word2', {depth => 4, data => 2}); # as above, but a neater way of sending the args
 @ari = $usf->list_overlaps('word1', 'word2', ref => 0); # returns open array of of words cued by word1 that cue word2

I<Alias>: overlaps

Returns a reference to a list that contains all the associates of the first word that are also associates of the second word; e.g., in the way that LOVE and HATE both cue LIKE. Note that the two words (here, LOVE and HATE) do not have to be themselves associated (unlike the basis of the L<overlap_strength|overlap_strength> and L<overlap_count|overlap_count> statistics). If that is what you want, first check that the two words L<are_related|assoc_depth (are_associated), are_related>. Two optional named arguments can be sent:

You can specify that the overlaps (shared associates) must be at least the I<n>th associate of the first I<and> second words by setting this as the B<depth> value. For example, COURT and JURY have 2 shared associates: both independently cue JUDGE and CASE. However, if B<depth> => 1, no shared associates are found, because the first associate of COURT is JUDGE and the first associate of JURY is COURT: so COURT and JURY have no associates in common at this depth of first associations. However, if B<depth> => 2, JUDGE is returned as a shared associate because it is the 2nd associate of JURY, as well as the 1st associate of COURT. CASE, however, is only COURT's 10th associate, and it is JURY's 9th associate; only if B<depth> is equal to 10 or more will both JUDGE and CASE be returned. By default, no such restrictions are applied; both words would be returned.

You can get back a flat list of the overlapping/shared associates themselves (the default), the lines of data for each of the I<first> word's (only) association with the shared associate (here, between COURT and JUDGE, and COURT and CASE), or these lines with keyed values, by setting B<data> as either 0, 1 or 2, respectively. Use the L<list_associates|list_associates> method to get data for the association of the I<second> word with these "overlaps," (here, between JURY and JUDGE, and JURY and CASE).

=cut

sub list_overlaps {
    my $self = shift;
    my ($w1exp, $w2exp) = _set_word_pairs(shift, shift);
    my $args = _set_args(@_);
	$args->{'depth'} = 100 if nocontent($args->{'depth'});
    
    my (@lines, %count) = ();
    $count{$_}++ for $self->list_associates($w1exp, depth => $args->{'depth'}, data => 0, ref => 0); # all assocs of word1
    $count{$_}++ for $self->list_associates($w2exp, depth => $args->{'depth'}, data => 0, ref => 0); # all assocs of word2
    
    foreach my $assoc(keys %count) {
        if ($count{$assoc} >= 2) { # common to both arrays
            push @lines, ! $args->{'data'} ? $assoc : $args->{'data'} == 1 ? $self->assoc_data($w1exp, $assoc, ref => 1) : $self->assoc_data($w1exp, $assoc, ref => 2);
        }
    }
    return exists $args->{'ref'} && $args->{'ref'} == 1 ? \@lines : @lines;
}
*overlaps = \&list_overlaps;

#---------------------------------------------------------------------
#vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

=head3 list_orthons

 $usf->list_orthons($word, {
  plus      => 0,
  minus     => 0,
  permutes  => 0,
  #classify  => 1, # boolean: returns as hash of type of orthons
  inclusive => 0, # boolean: all orthons up to the plus/minus specified, or only at these levels
  sortby    => 'alpha', # alphabetical order or as retrieved
  splits_ok => 1, # if plus/minus > 1, then additions/deletions need not be consecutive, e.g. not only PAT -> PAINT but also -> SPATE
  subst     => n, # any value from 1 (default) to word-length minus 1.
  #syllables => [min, max],
  cues_only => 0,
  data      => 0,
  ref       => 0,
 ); # these are the default param values and returns only substitutive orthons

The module identifies orthographic neighbours among the words in the USF norms. The traditional definition of an orthographic neighbour - or I<orthon> as it is here called - is that two words are equal apart from a single substitution of a letter. This involves no transpositions of letters, and no additions or eliminations of letters, i.e, no altering of the number of characters between neighbours. This is the traditional "Coltheart" definition of an orthon. It is a matter of some psycholinguistic research to test the psychological breadth of orthonic relations. With this exploration in mind, the module permits retrieving orthons based on permutations, additions and deletions of letters. The following defines these possiblities.

=over 2

=item Substitution (Coltheart) orthons

Returns a list of orthographic neighbors ("orthons"), or of their data lines, for a single letter string. For two words to be orthons of each other, they must share all the same letters, in the same positions, except for one letter substitution. So the orthons of PAT returned by this method are B<B>AT	B<C>AT,	PAB<D>,	PAB<N> (among others, in alphabetical order). This satisfies the traditional "Coltheart" definition of orthons (see L</"REFERENCES">), where only I<one> substitution is permitted.

=item Permutation orthons

Orthographic neighbours might also involve all possible perumutations of a word's characters. Setting B<permutes> to 1 includes orthographic neighbors that are identical to the tested word in that they share exactly the same letters, but in different positions; so, PAT would also yield APT and TAP as orthons; DOME, usually having 9 "Coltheart" orthons, has 10 with the inclusion of MODE. These do not have to be of equal length with the tested word; addition or substraction of letters is supported, with or without permutations, as below.

=item Insertion and deletion orthons

You can also specify that the orthons could comprise words involving one or more letter I<insertions> or I<deletions>, as well as single substitutions. This is offered by the options B<plus> and B<minus>. So for PIN, the single substitutive, 1-mismatch (Coltheart) orthons include B<S>IN, PB<A>N and PIB<G>. But if B<plus> => 1, you get plus-orthons that might start or end with any other letter (e.g., B<S>PIN, PINB<K>), or contain another letter within the string (e.g., PAB<I>N). Setting B<plus> => 2, extends the word by this much on either side or both sides: you now also get, e.g., B<S>PINB<E> and PINB<CH>), and within the word (PB<LA>IN), or both within and about the word (e.g., PB<A>INB<T> and PIB<A>NB<O>). The opposite applies for setting the value of B<minus>, e.g., among PAINT's minus-orthons are, if B<minus> => 1, PAIN (taking away T) and PANT (taking away I), or, if B<minus> => 2, PIN (taking away A and T) and PAT (taking away I and N).

=back

The argument B<cues_only> restricts the search for orthons to words that have been used as cues in the USF norms, excluding words that only appeared among the associates given by participants. The default value is 0 (all words, not only cues, will be searched).

The argument B<inclusive> operates (only) when B<plus> or B<minus> is greater than zero. It simply commands that either what is returned is a list of I<all> orthons I<including> substitution (same-size) orthons, and one or more insertion or deletion orthons (depending, respectively, on the values of I<plus> and I<minus>), I<or> only those orthons that are equal to the size of the word plus or minus the value of B<plus> or B<minus>.

The arguments B<data> and B<ref> specify what you want returned. By default, this is simply an open list of words: the orthons themselves (which could be empty if the word has no orthons). If B<ref> => 1, then this is a referenced list. If B<data> => 1, then you get not a list of words but an array of array references, being the lines of data for each orthon. If B<data> => 2, you get an array of hash references, where the lines of data for each orthon are returned as key => value pairs, the keys being the stat-codes (e.g., SSZ for set-size, FRQ for frequency ...); see the L<single word statistics|/"Statistics for single words">. Again, the value of B<ref> specifies whether these arrays are returned openly or as references themselves.

=cut

sub list_orthons {
	my ($self, $word, @ari) = @_;

    # Assign arguments:
    croak __PACKAGE__, ' needs a word for finding its orthographic neighbours' if nocontent($word);
    $word = uc($word);
    my $args = _set_args(@ari);
    $args->{'data'} ||= 0;
    $args->{'plus'} = 0 if ! $args->{'plus'};
    $args->{'minus'} = 0 if ! $args->{'minus'};
    my $want_lendiff = ($args->{'plus'} || $args->{'minus'}) ? 1 : 0;
    require Algorithm::Combinatorics;
    # Build models:
    my (%w_lines, $letter, $i, @minus_strs, @plus_strs, @permutes, %tested, %sel) = ();
    my $wrd_len = length($word);
    my @chars = split//, $word;
    
    # init an array of sub-strings of the word, removing one letter at a time from the word, e.g., PAGE yields AGE, PGE, PAE and PAG
    unless ($wrd_len == 1 || !$want_lendiff) { # no point just getting a list of single letters; & number of additions/deletions must be > 0.
        if ($args->{'plus'}) {
            my $start_val;
            if ($args->{'plus'} > 1 && $args->{'inclusive'}) {
                $start_val = 1;
            } 
            else {
                $start_val = $args->{'plus'};
            }
            my $end_val = $args->{'plus'};
            my ($i, @addon, @ari);
            for ($i = $start_val; $i <= $end_val; $i++) { # will do only once if not "inclusive"
                @addon = (repeat('*', $i));
                if ($args->{'splits_ok'}) {
                    @addon = split//, $addon[0];
                }
                @ari = Algorithm::Combinatorics::permutations([@chars, @addon]);
                push @plus_strs, join('', @{$_}) foreach @ari;
                if (!$args->{'permutes'}) {
                    @plus_strs = _cull_permuted(\@chars, \@plus_strs);
                }
            }
        }
        if ($args->{'minus'}) {
            my $start_val;
            if ($args->{'minus'} > 1 && $args->{'inclusive'}) {
                $start_val = 1;
            } 
            else {
                $start_val = $args->{'minus'};
            }
            my $end_val = $args->{'minus'};

            for ($i = $start_val; $i <= $end_val; $i++) { # will do only once if not "inclusive"
                my @ari = ();
                if ($args->{'permutes'}) {
                    #if ($args->{'minus'} > 1 && !$args->{'splits_ok'}) { # makes no difference given permutations
                    #    # first get the permutations of characters, and then rem consecutive elements of length $i:
                    #    my @perms = Algorithm::Combinatorics::permutations(\@chars);
                    #    my ($j, @chars_cpy) = ();
                    #    foreach my $p_chars (@perms) {
                    #        for ($j = 0; $j < $wrd_len; $j++) {
                    #            @chars_cpy = @{$p_chars};
                    #            splice(@chars_cpy, $j, $i);
                    #            push @ari, \@chars_cpy;
                    #        }
                    #    }
                    #}
                    #else {
                        @ari = Algorithm::Combinatorics::variations(\@chars, $wrd_len - $i);
                    #}
                }
                else {
                    if ($args->{'minus'} > 1 && !$args->{'splits_ok'}) { # INT	PNT	PAT	PAI  >> PAT
                        my ($j, @chars_cpy) = ();
                        for ($j = 0; $j < $wrd_len; $j++) {
                            @chars_cpy = @chars;
                            splice(@chars_cpy, $j, $i);
                            push @ari, \@chars_cpy;
                        }
                    }
                    else {# PAI	PAN	PAT	PIN	PIT	PNT	AIN	AIT	ANT	INT  >> ANT	PAN	PAT	PIN	PIT
                        @ari = Algorithm::Combinatorics::combinations(\@chars, $wrd_len - $i);
                    }
                }
                push @minus_strs, join('', @{$_}) foreach @ari;
                @minus_strs = uniq(@minus_strs); # might have been duplicate letters in the orig. word
            }
        }
    }
    
    my $do_eqlencheck = ($args->{'plus'} || $args->{'minus'}) ? $args->{'inclusive'} ? 1 : 0 : 1;
    
    if ($args->{'permutes'} && $do_eqlencheck) {
        push @permutes, join('', @{$_}) foreach Algorithm::Combinatorics::permutations(\@chars);
    }

    #print "minus_strs = ", join"\t", @minus_strs, "\n";
    #print "plus_strs = ", join"\t", @plus_strs, "\n" if scalar @plus_strs;
    #print "permutes = ", join"\t", @permutes, "\n" if scalar @permutes;
    
    my ($is_word) = ();
    # Now check for being a word:
    #if (scalar @plus_strs) {
    #    foreach my $p(@plus_strs) {
    #        if (!$args->{'data'}) {
    #            $is_word = $self->is_word($p);
    #            if($is_word) {
    #                unless (_cull_ex_words($self, $p) ) {
    #                    $sel{$p} = 1;
    #                } 
    #            }       
    #        }
    #    }
                        #$sel{$str} = ! $args->{'data'} ? 1 : $args->{'data'} == 1 ? [split/,/] : _hashify_line([split/,/]);
    #}
    # _return_list(\%sel, $args->{'data'}, $args->{'ref'}, $args->{'sortby'});
    
    # Retrieve words:
    #require Lingua::Ortho;
    #my $ortho = Lingua::Ortho->new();
    my (@lines) = ();
    $tested{$word} = 1;
    foreach $letter('A' .. 'Z') {
        my $path = _path2table4letter($self, $letter) || next;
	    open LISTORTHONS_FH, '<' . $path || croak __PACKAGE__, " could not access database table '$path'";
	    my ($fnd, @strs) = (0);
    	LINE:
    	while (<LISTORTHONS_FH>) {
	    	next LINE if $. == 1;
            /^([^,]+),\s*([^,]+),/; # isolate the cue and associate
            @strs = ($1);
            push @strs, $2 unless $args->{'cues_only'};
            STR:
            foreach my $str(@strs) {
                next STR if $tested{$str}; # must not be the same word or one already accepted or rejected
                $tested{$str} = 1;                 #next STR if $w_lines{$str}
                my $lendiff = $wrd_len - length($str);
                if( $want_lendiff == 0) { # simple Coltheart orthons
                    next STR if $lendiff;
                }
                else { # different lengths of orthons OK:
                    if($args->{'plus'} && $lendiff < 0) {# must still be only one more or less than word-length:
                        next STR if abs($lendiff) > $args->{'plus'};
                    }
                    elsif ($args->{'minus'} && $lendiff > 0) {
                        next STR if abs($lendiff) > $args->{'minus'};
                    }
                }
                
                if (
                    (
                    !$lendiff
                        and (
                            _index_identical($word, $str) == $wrd_len - 1 # Coltheart type eq length (substitution only)
                            ||
                            any { $_ eq $str} @permutes  # any full-length permutations
                        )
                    )
                    ||
                    ( any { $_ eq $str} @minus_strs ) # single internal deletion, O(AT)= A,  O(FLOOD) = FOOD
                    ||
                    ( any { _index_identical($_, $str) == $wrd_len} @plus_strs ) # e.g., O(IN) = PIN, O(PAT) = {PATE, PART, ...}
                    ) {
                    #print "accepting $str\n";
                    #use String::Compare; my $percentage = compare($word, $str); print "$percentage\n";
                    unless (_cull_ex_words($self, $str) ) {
                        #push @lines, ! $args->{'data'} ? $str : $args->{'data'} == 1 ? [split/,/] : _hashify_line([split/,/]);
                        $sel{$str} = ! $args->{'data'} ? 1 : $args->{'data'} == 1 ? [split/,/] : _hashify_line([split/,/]);
                    }
                }
            }# end foreach of cue and target
        }
        close (LISTORTHONS_FH);
     }
     _return_list(\%sel, $args->{'data'}, $args->{'ref'}, $args->{'sortby'});
     #return exists $args->{'ref'} && $args->{'ref'} == 1 ? \@lines : @lines;
}

sub _cull_permuted {
    my ($chars_aref, $strs_aref) = @_;
    my ($i, $j, @pchars, %plusses) = ();
    #print "pre-culled:\n";
    #print join"\t",  @{$strs_aref};
    #print "\n";
    $plusses{$_} = 1 foreach @{$strs_aref};
    PLUS_STRS:
    foreach (@{$strs_aref}) {
        @pchars = split//;
        $j = 0;
        for ($i = 0; $i < scalar(@pchars); $i++) {
            if ($pchars[$i] ne '*') {
                if ($pchars[$i] ne $chars_aref->[$j]) {
                    delete $plusses{$_};
                    next PLUS_STRS;
                }
                $j++;
            }
        }
    }
    return keys %plusses;
}

sub _index_identical {
    my ($w1, $w2, $n, $i) = (shift, shift, 0); # BENCHMARK: ~10%-25% faster than by list and separate decs
    for ($i = 0; $i < length($w1); $i++) { # BENCHMARK: usually a few % faster than declaring outside
         $n++ if substr($w1, $i, 1) eq (substr($w2, $i, 1) or last); # BENCHMARK: ~10%-20% faster than or by ||
    }
    return $n;
}

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#---------------------------------------------------------------------

=head3 list_words

 $aref = $usf->list_words(
  SSZ       => [1, 10],     # set-size min/max
  CNM       => [0, 1.5],    # connectivity-mean min/max
  RSM       => [0, .50],    # resonance probability min/max
  RSS       => [0, .05],    # resonance strength min/max
  FRQ       => [0, 100],    # print-frequency min/max
  CCN       => [1, 7],      # concreteness min/max: range = 1 (abstract) .. 7 (concrete)
  HMG       => 0,           # homograph: 0 or 1
  PSP       => [qw/N V AJ AD PP P C I/], # part-of-speech (one or more character codes)
  CCT       => [1, 5],      # cue-count min/max
  ONC       => [1, 5],      # orthographic neighbourhood size min/max
  syllables => [1, 10],     # no. of syllables min/max
  chars     => [4, 7],      # no. of characters min/max
  first     => [qw/A B/],   # only for words starting with these (1 .. 26) letters; default is A .. Z.
  cues_only => 1,           # default lists "targets" as well as the cues
  skipnull  => 1,           # don't assume empty (or invalid char) cells meet criteria, skip them (default)
  data      => 0,           # 0 (words only) or 1 (whole line of data) or 2 (same but keyed)
  ref       => 0,           # whether to return reference to list or open list
  sortby    => 'alpha',
 );

Returns a list of words satisfying a set of criteria. The criteria are simply set as a keyed list (hash). The keys are the upper-case codes for the single word statistics (see L<word_stat|word_stat>) that are included in the USF database. The values are (for the most part) an array-reference that gives (in succession) the minimum and maximum value for this particular statistic. For example, B<FRQ> => [1, 10] specifies that only words with a print-frequency of at least one and no more than 10 will be returned in the list. The minima and maxima of the B<chars> of the word, and its number of B<syllables> (using L<Lingua::EN::Syllable|Lingua::EN::Syllable>), can be specified in the same way. Some fields are empty or have a non-alphanumerical character where data is not available.

Searches can be restricted to only the cues in the database by setting B<cues_only> => 1 (the default is 0). This will also make for faster searches. If B<first> is specified - either as a single letter or a referenced list of one or more letters - then only words starting with the letter(s) will be searched; amounting to only 1 table instead of all tables if B<cues_only> => 1. Searches will also be quite slower if any of the L<exclusion criteria|Excluding words and word-pairs> applying to the cue-target pairs are loaded I<and> B<cues_only> => 0, and this might also affect the number of retrievals. For example, with B<cues_only> => 0, L<list_words|list_words> returns 5339 words if, additionally, B<chars> is set to range from 2 to 6 and no exclusion criteria are set. If, however, B<ex_cultural> => 1, then (using the version .01 list) the method returns only 5324 words. Exclusions are only applied to word pairs, and only once. So, with cultural exclusions on, BUSH would be excluded when it appears with GOVERNOR, but it would naturally be picked up on some other basis. 

Presently, the method simply checks each criterion in succession, starting with character and syllable counts. The results, by default, are returned as a reference to a list of words. These can be alphabetically pre-sorted by specifying B<sortby> => 'alpha' (no other options in this version). Alternatively, if B<ref> => 0, the list is returned as an open array. 

=cut

sub list_words {
	my ($self, @ari) = @_;
    my $args = _set_args(@ari);
    my (@terms, %sel, %rej, $letter, @line, $term, $path) = ('');
    
    my @firsts = (); # BENCHMARK: ~2% faster search than by ref
    if (exists $args->{'first'}) {
        my @first_tmp = ref $args->{'first'} ? @{delete $args->{'first'}} : ($args->{'first'});
        foreach (@first_tmp) {
            push @firsts, uc($_) if /^[A-Z]/i;
        }
        @firsts = ('A' .. 'Z') if ! scalar @first_tmp;
    }
    else {
        @firsts = ('A' .. 'Z');
    }

    my $cues_only = delete $args->{'cues_only'} || 0;
    my @tables = $cues_only ? @firsts : ('A' .. 'Z');
    
    my $data = delete $args->{'data'} || 0;
    my $skipnull = defined $args->{'skipnull'} ? delete $args->{'skipnull'} : 1;
    my ($lett_ok, $dat) = ();
    my $fields = $self->{'_field_aref'};
    my @lenari = @{delete $args->{'chars'} || []};
    my @sylari = @{delete $args->{'syllables'} || []};
    require Lingua::EN::Syllable if @sylari;
    # Loop through database files:
	foreach $letter(@tables) {
		$path = _path2table4letter($self, $letter) || next;
        open(LISTWORDS_FH, '<' . $path) or die $! . " $path";
        SEARCH:
        while (<LISTWORDS_FH>) {
       	    next SEARCH if $. == 1; # ignore header row
    	    chomp;
     		@line = split/,\s*/;
            next SEARCH if $cues_only && $line[0] eq $terms[0];# if cues only, cues must not already be just tested
            @terms = ($line[0]);
            push @terms, $line[1] unless $cues_only;
            my ($i) = ();
            TERMS:
            for ($i = 0; $i < scalar(@terms); $i++) {
                # not rejected (in this file) or selected:
                # use grep: much faster than using 'any' (or regex); abt 2% faster in this eq order:
                next TERMS if $sel{$terms[$i]} || $rej{$terms[$i]} || !grep( {$_ eq substr($terms[$i], 0, 1)} @firsts);
                #print "testing $letter: $terms[$i]\n";
                if (@lenari) {
                    $dat = length($terms[$i]);
                    if ($dat > $lenari[1] || $dat < $lenari[0]) {
                        $rej{$terms[$i]} = 1;
                        next TERMS;
                    }
                }
                if (@sylari) {
                    $dat = Lingua::EN::Syllable::syllable($terms[$i]);
                    if ($dat > $sylari[1] || $dat < $sylari[0]) {
                        $rej{$terms[$i]} = 1;
                        next TERMS;
                    }
                }
                my ($reji, $j, $f) = (0);
                FIELDS:
                foreach $f(@{$fields}, qw/CCT ONC/) { # check minima and maxima of all fields, if specified in args: 
                    next FIELDS if nocontent($args->{$f}); # could equal a boolean, a single letter, or aref
                    $j = firstidx {$_ eq $f} @{$fields}; # get the index in the line for this field
                    $j += $i ? 14 : 5;
                    if (_isnullcell($line[$j]) && $skipnull) { # say "skipping field $f because cell is NULL";
                        $reji = 1;
                    } 
                    elsif (nocontent($line[$j]) || $line[$j] =~ /^[A-Z]/) { # say "cell is char: field is homograph or part-of-speech";
                        if (ref $args->{$f}) { # this should be for part-of-speech or particular HMG rating, e.g., ['N', 'AJ']
                            $reji = 1 unless any { $line[$j] eq $_ } @{$args->{$f}};
                        }
                        elsif ($args->{$f} =~ /^[A-Z]/) { # e.g. 'N'
                            $reji = 1 if $line[$j] ne $args->{$f}; 
                        }
                        else { # just 1 or 0 - special attention for HMG homograph field:
                            $reji = 1 if ( $args->{$f} == 0 && hascontent($line[$j]) ) || ($args->{$f} == 1 && nocontent($line[$j])) ;
                        }
                    }
                    else {#    say "assessing numerical stat $f with $j";
                        my $val;
                        if (_is_valid_fieldname($self, $f)) { # is a canned field:
                            $val = $line[$j];
                        }
                        else { # must be CCT or ONC, to derive:
                            $val = $self->word_stat($terms[$i], uc($f));
                            #print "CCT or ONC val $val\n";
                        }
                        if (defined $args->{$f}->[0] and $val < $args->{$f}->[0]) { # is valid by minimum?
                            $reji = 1;
                        }
                        elsif (defined $args->{$f}->[1] and $val > $args->{$f}->[1]) { # is valid by maximum?
                            $reji = 1;
                        }
                    }
                    do {$rej{$terms[$i]} = 1; last FIELDS;} if $reji;
                }
                unless ($reji) {
                    # exclusions: don't add to %rej; do last as longest; check each separately otherwise miss words ok by non-pairing                     
                    next TERMS if _cull_ex_words($self, $terms[$i]);
                    $sel{$terms[$i]} = !$data ? 1 : $data == 1 ? \@line : _hashify_line(\@line);
                }
            }
		}
		close LISTWORDS_FH;
        %rej = (); # only keep per file - no diff in speed however
    }
  
    _return_list(\%sel, $data, $args->{'ref'}, $args->{'sortby'});
}

#=head3 cue_iterate

#=cut

#sub cue_iterate {
#    my ($self) = @_;
#    my @cues = $self->list_cues(data => 0, ref => 0, sortby => 'alpha');
#    foreach my $cue(@cues) {
#        my @assocs = $self->find_associates(ref => 0);
#        foreach my $assoc(@assocs) {
#            my $assoc_data = $self->find_pair($cue, $assoc, data => 2, ref => 1);
#        }
#    }
#}

=head3 find_associate

 $word = $usf->find_associate($word, depth => 1, data => 0); # gets just the word that's $word's 1st associate
 $ari = $usf->find_associate($word, depth => 1, data => 1, ref => 0); # gets data as an open array for $word's 1st associate

Return the word or line of data for the I<n>th associate of a word, as specified by B<depth> (the default is the first associate). Other specifications of what should be returned can be set by the B<data> and B<ref> arguments; see above.

Returns undef if there is no associate for the word.

=cut

sub find_associate {
	my ($self, $word, $depth, $retline) = @_;
    return undef if nocontent($word);
    $word = uc($word);
    $depth ||= 1;
	
    my ($fnd, $path, @line) = (1);
    $path = _path2table4letter($self, substr($word, 0, 1)) || return undef;
	open(FINDASSOC_FH, '<', $path) or die $! . " $path";
	while (<FINDASSOC_FH>) {
		next if $. == 1;
        chomp;
		if (/^$word,/) {
			if ($fnd == $depth) { 
                @line = split/,/;
	    		last;
            }
            else {
                $fnd++;
                next;
            }
		}
	}
	close(FINDASSOC_FH);

    #return exists $args->{'ref'} && $args->{'ref'} == 1 ? \@lines : @lines;
    return scalar @line ? !$retline ? $line[1] : $retline == 1 ? \@line : _hashify_line(\@line) : undef;
	##return wantarray ? ($line[1], \@line) : $line[1];
}

=head3 find_pair

 $aref = $usf->find_pair($w1, $w2);

Given two words, finds if they are associated. If so, the default is to return true, otherwise false. Specifications can be made to return, instead, a list (or reference to a list) of all the association data, keyed by its field-codes as a hash, or simply as a flat array. The method is basically another way to check if two words are associated, alternative to L<assoc_depth|assoc_depth (are_associated), are_related> and L<assoc_data|assoc_data> that are more specific requests (and which use this method internally). Note that the pair will not be found if they appear among any pairs that have been set for L<exclusion|Excluding words and word-pairs>.

=cut

sub find_pair {
    my ($self, $w1exp, $w2exp) = @_;
    croak __PACKAGE__, ' needs two words for finding as pair' if nocontent($w1exp) || nocontent($w2exp);
	my ($w1_fnd, $pr_fnd, $path, @line) = (0, 0);
	$w1exp = uc(trim($w1exp));
    $w2exp = uc(trim($w2exp));
	my $w2obs = '';
    $path = _path2table4letter($self, substr($w1exp, 0, 1))  || return undef;
	
    open FINDPAIR_FH, '<' . $path or die $! . " $path";
	while (<FINDPAIR_FH>) {
		next if $. == 1;
		if (/^\Q$w1exp\E,/) {
			@line = split/,/;
            my $w2obs = $line[1];
            next if _cull_ex_words($self, $w1exp, $w2obs);
            $w1_fnd++; # word1 has been found
            if ($w2exp eq $w2obs) { # word2 has been found with word1
				$pr_fnd = 1;
				last;
			}
		}
		else {
			last if $w1_fnd; # have already found word1 - and not found word2 with it
		}
	}
	close FINDPAIR_FH;

	if ($pr_fnd) {
        push @line, $w1_fnd;
		return \@line;
	}
    elsif ($w1_fnd) {
        return 0;
    }
	else {
		return undef;
	}
}

=head3 random_word

 $word = $usf->random_word(); # assumes data => 0
 $aref = $usf->random_word(data => 1); # random line of data
 $href = $usf->random_word(data => 2); # random line of data, with the values named
 $word = $usf->random_word(cues_only => 0, first => 'A', nwords => 2); # extra options

I<Alias>: random_data

Picks out a random word from the database. You can limit this to only the cues by specifying B<cues_only> => 1, otherwise, the cue or (randomly) the associate of any randomly selected line will be returned. Sampling is with replacement. Choosing a word starting with a particular letter is randomly made from a discrete distribution (using L<Math::Random::Discrete|Math::Random::Discrete>) where the letters are weighted for their frequency of occurrence as first letters in English (using the Project Gutenberg corpus). Alternatively, you can also specify, optionally, the first letter as the value of B<first>. As in the above methods, you can also use the parameter B<data> to specify if you want the word itself, the actual line that was randomly selected, or the line with its values keyed.  Also control how many words you want with B<nwords>; if more than one, you get back an array-reference (of strings, arefs or hashrefs, depending on the value of B<data>). The method uses the default algorithm in the module L<File::RandomLine|File::RandomLine> to pick the lines once a file is randomly chosen.

=cut

sub random_word {
	my ($self, @ari) = @_;
    my $args = _set_args(@ari);
    my ($first, $undef) = ();
    if ($args->{'first'} and any { $_ eq uc($args->{'first'}) }('A' .. 'Z') ) {
    #if ($args->{'first'} and !grep( {$_ eq uc($args->{'first'})} ('A' .. 'Z')) ) {
        $first = uc(delete $args->{'first'}); # accept given first letter
        $undef = 0;
    }
    else {
        $first = undef;# pick random letter
        $undef = 1;
    }
    
    my $ntimes = $args->{'nwords'} || 1;
    require File::RandomLine;
    require Math::Random::Discrete;
    my ($path, $frand, @get_lines, @ret_lines) = ();
    
    my $letter_weights = $args->{'letter_weights'};
    $letter_weights ||= $self->{'_first_letter_w'};
    
    if (nocontent($first)) {
        my $rand_firstletter = Math::Random::Discrete->new($letter_weights, ['A'..'Z']);
        while (scalar @get_lines < $ntimes) {
            $first = $rand_firstletter->rand;
            $path = _path2table4letter($self, $first) || next;
            $frand = File::RandomLine->new($path, { algorithm => 'fast' });
            push @get_lines, grep {!/^CUE,TARGET,/} $frand->next(1);
            $first = undef if $undef;
        }
    }
    else {
        #foreach (1 .. $ntimes) {
            $path = _path2table4letter($self, $first) || next;
            $frand = File::RandomLine->new($path, { algorithm => 'fast' }); # need to specify the default or warning re '$algo' is given
            push @get_lines, grep {!/^CUE,TARGET,/} $frand->next($ntimes);
        #}
    }

    foreach (@get_lines) {
	    chomp;
	    my @line_i = split/\s*,\s*/;
        my $idx = $args->{'cues_only'} ? 0 : int(rand(2));
        push @ret_lines, ! $args->{'data'} ? $line_i[$idx] : $args->{'data'} == 1 ? \@line_i : _hashify_line(\@line_i);
    }
    return scalar @ret_lines > 1 ? \@ret_lines : $ret_lines[0];
}
*random_data = \&random_word;

=head3 list_by_pattern

 $aref = $usf->loop_thru(pattern => '$be', cues_only => 1|0, data => 0|1|2);  

I<Alias>: loop_thru, find_by_pattern

This method loops through all the files, and each line in each file, to obtain a list of words that bear a particular string pattern, as set in the argument B<pattern>. This is a string that takes the form of a Perl regular expression. So, to find all words starting with 'the' (like THEME, THEN and THEATRE), you would request:

 $usf->list_by_pattern(pattern => '^the')

To find those ending with 'able':

 $usf->list_by_pattern(pattern => 'able$')

Or to find those containing the vowels 'oe' together, anywhere in the string, simply:

 $usf->list_by_pattern(pattern => 'oe')

All searches are case-insensitive (given that all the words are databased in capital letters anyway).

If cues_only => 1 (the default), only the cues will be looked up. Otherwise, if you set this to false, both the cues and the targets in each line will be looked up.

=cut

sub list_by_pattern {
    my ($self, @ari) = @_;
    my $args = _set_args(@ari);
    my ($path, @lines, %sought_in_file) = ();
    my $pattern = delete $args->{'pattern'} or croak __PACKAGE__, " requires a pattern to be looked up";
    $pattern = qr/\Q$pattern/;
    my $cues_only = defined $args->{'cues_only'} ? $args->{'cues_only'} : 1;
    #print "got to search $pattern\n";
    foreach my $first('A' .. 'Z') {
        $path = _path2table4letter($self, $first) or next;
        %sought_in_file = ();
        open DB, '<' . $path or die $! . " $path";
	    while (<DB>) {
		    next if $. == 1;
    		chomp;
    		/^([^,]+),([^,]+),/i;
            my ($w1, $w2) = ($1, $2);
            next if $sought_in_file{$w1};
            $sought_in_file{$w1} = 1;
	    	if ($w1 =~ $pattern) {#print "found $w1\n";
                push @lines, $w1;
            }
            if (!$cues_only) { # also look up "targets" ($w2):
                next if $sought_in_file{$w2};
                $sought_in_file{$w2} = 1;
	    	    if ($w2 =~ $pattern) {#print "found $w2\n";
                    push @lines, $w2;
                }    
            }
    	}
    	close DB;
    }
    return \@lines;
    #if ($pr_fnd) {
    #    push @line, $w1_entry;
	#	return \@line;
	#}
    #elsif ($w1_entry) {
    #    return 0;
    #}
	#else {
	#	return undef;
	#}
}
*loop_thru = \&list_by_pattern;
*find_by_pattern =\&list_by_pattern;

=head2 Excluding words and word-pairs

Some words or associations can be excluded from retrieval so that they will not appear in the L<"list" methods|/"Retrieving words and word-pairs">, and will not be included in the statistics that are calculated on-the-fly from (rather than listed in) the database. These could be words or word-associations that are culturally peculiar, perhaps from the time when the norms were collected, or that are not associated in ways that you want, e.g., by only a structural or phrasal association. Also, some associations in the database are based on errors of reading, hearing or spelling. Here are some examples:

=over 2

=item ex_cultural

FREEDOM->USA, DUNE->MOVIE, BOY->GEORGE

=item ex_errors

ATTIC->DRUGS, SUPER->DINNER, MAGICIAN->TRUCK

=item ex_phrasal

INK->BLOT, POOR->TASTE, PRACTICE->PREACH

=item ex_structural

BOTHER->BROTHER, LAPEL->CHAPEL, HUE->HOE

=back

The excluded words or word-associations are read-in from five .csv files in the directory "Lingua/Norms/USF/excludes", wherever the module is installed. By default, these files are I<not> loaded, and no exclusions are applied. You have to manually set the attributes B<ex_cultural>, B<ex_errors>, B<ex_phrasal>, B<ex_structural> and B<ex_misc> to 1 if you want the list of words, or pair of words, in the associated file(s) to be ignored when "listing" words or deriving statistics. This can be simply done when calling L<new|new>:

 $usf = Lingua::Norms::USF->new(dbdir => '../db/is/in/here/', ex_errors => 1, ex_cultural => 0, ex_phrasal => 0, ex_structural => 0, ex_misc => 1);

or at any time by L<set_exclusions|set_exclusions>.

You can edit these files by adding words in the same format (as comma-separated pairs or single words per line), or by simply deleting a line or prepending it with a hash so as to I<not> exclude it. The file 'ex_misc.csv' is empty by default, awaiting user-defined, on-the-fly exclusions.

Note that once loading these files via L<new|new>, there will be a slow-down in returning from most of the methods for calculated stats and lists - but with very little overall effect on the actual stats or lists returned. For example, DIARY has a listed L<set-size|setsize> of 14 in the database. However, this includes its associations to COW and MILK; clearly errors, confused with DAIRY. So if L<setsize|setsize> is called with B<calc> => 1, and the "ex_errors" file has been loaded (in L<new|new> or L<set_exclusions|set_exclusions>), then this method will return 12. The number of exclusions affected in this way, however, is very small. 

Note that exclusions affect association depth. So MAN is normed as the fourth associate of BOY, but setting B<ex_cultural> => 1, it becomes the third associate as GEORGE, the normed third associate, is excluded. Methods that do not depend on immediate paired words, however, such as L<list_words|list_words>, will only be affected if the individual words being retrieved appear as a one-word line in the exclusions files; so, with  B<ex_cultural> => 1, FIG is ok to return even though it is lined up in the cultural exclusion list with NEWTON, but FBI will not as it appears in a line of its own.

Users of this module are naturally welcomed to submit any additions - via the author email, or the cpan forum. Naturally, any updated versions of these files installed with a newer version of the module will not respect any edits made directly to earlier versions of the files. 

=head3 set_exclusions

 $usf->set_exclusions(ex_cultural => 1|0, ex_errors => 1|0, ex_phrasal => 1|0, ex_misc => 1|0, ex_structural => 1|0);

One way to load exculsions is by the L<new|new> method, but this can also be directly done here once passing L<new|new>. Give one or more keyed booleans to specify which class of exclusions to include. The default for each type is zero/false, i.e., they won't be loaded/applied. To see what files are loaded for exclusion at any time, check the values of the files keyed under C<$self-E<gt>{'exclusions'}>.

=cut

sub set_exclusions {
    _init_ex_words(shift, _set_args(@_));
}

=head3 unset_exclusions

 $self->unset_exclusions();

Clears I<all> exclusions in one hit. Turn individual ones on/off with L<set_exclusions|set_exclusions>.

=cut

sub unset_exclusions {
    my $self = shift;
    $self->{$_} = undef for qw/_ex_pairs _ex_singles/;
    $self->{'exclusions'}->{$_} = 0 foreach qw/cultural errors misc phrasal structural/;
    $self->{'_any_exclusions_on'} = 0;
}

#------------------
# Private functions:
#------------------

# Returns a list of the headings of each column in the database, from CUE and TARGET, to TRSG and TUC. An additional final heading (ASD) is for the depth of an association (1st, 2nd, etc.). 
sub _headings_list {
    return (qw/CUE TARGET NORMED NG NP FSG BSG MSG OSG MN MMIA ON OMIA QSS QFR QCON QH QPS QMC QPR QRSG QUC TSS TFR TCON TH TPS TMC TPR TRSG TUC ASD/);
}

sub _cellval_ok {   
    my ($self, $gotvals, $wantfields, $term, $rejlist, $i, $skipnull) = @_;   
    my $val_ok = 1;          
    my ( $j, $f, $cellval) = (0);
    FIELDS:
    foreach $f(@{$self->{'_field_aref'}}, qw/CCT ONC/) { # check minima and maxima of all fields, if specified in args: 
        next FIELDS if nocontent($wantfields->{$f}); # could equal a boolean, a single letter, or aref
        $j = firstidx {$_ eq $f} @{$self->{'_field_aref'}}; # get the index in the line for this field
        $j += $i ? 14 : 5;
        $cellval = $gotvals->[$j];
        if (_isnullcell($cellval) && $skipnull) { # say "skipping field $f because cell is NULL";
            $val_ok = 0;
        } 
        elsif (nocontent($cellval) || $cellval =~ /^[A-Z]/) { # say "cell is char: field is homograph or part-of-speech";
            if (ref $wantfields->{$f}) { # this should be for part-of-speech or particular HMG rating, e.g., ['N', 'AJ']
                $val_ok = 0 unless any { $cellval eq $_ } @{$wantfields->{$f}};
                #$val_ok = 0 unless !grep( {$_ eq $cellval} @{$wantfields->{$f}});
            }
            elsif ($wantfields->{$f} =~ /^[A-Z]/) { # e.g. 'N'
                $val_ok = 0 if $cellval ne $wantfields->{$f}; 
            }
            else { # just 1 or 0 - special attention for HMG homograph field:
                $val_ok = 0 if ( $wantfields->{$f} == 0 && hascontent($cellval) ) || ($wantfields->{$f} == 1 && nocontent($cellval)) ;
            }
        }
        else {#    say "assessing numerical stat $f with $j";
            my $val;
            if (_is_valid_fieldname($self, $f)) { # is a canned field:
                $val = $cellval;
            }
            else { # must be CCT or ONC, to derive:
                $val = $self->word_stat($term, uc($f));
                #print "CCT or ONC val $val\n";
            }
            if (defined $wantfields->{$f}->[0] and $val < $wantfields->{$f}->[0]) { # is valid by minimum?
                $val_ok = 0;
            }
            elsif (defined $wantfields->{$f}->[1] and $val > $wantfields->{$f}->[1]) { # is valid by maximum?
                $val_ok = 0;
            }
        }
        do {$rejlist->{$term} = 1; last FIELDS;} if !$val_ok;
      }
      return $val_ok;
 }

# should get rid of this one; embed within word_stat:
sub _return_stat_values {
    my ($self, $fields_wanted, $data_line, $args) = @_;
    return undef if ! ref $data_line; # (should) return undef if word1 not found, or (for paired stats) 0 if word2 not found with word1
    $fields_wanted = [$fields_wanted] if ! ref $fields_wanted;
    my $old_ref = delete $args->{'ref'} || 0;
    $args->{'ref'} = 0;
    my (@vals, @invals) = ();
    foreach my $field(@{$fields_wanted}) { # sort out those that must be calculated from those that can be drawn from the data "$data_line"
        if ($args->{'calc'} && $field =~ /^SSZ/) {
            push @vals, scalar $self->list_associates($data_line->[0], $args) || 0;
        }
        elsif ($field =~ /^CCT/){
            push @vals, scalar $self->list_cues($data_line->[0], $args) || 0;
        }
        elsif ($field =~ /^ONC/) {
            push @vals, scalar $self->list_orthons($data_line->[0], $args) || 0;
        }
        else {
            if (_is_valid_fieldname($self, $field)) {  # ensure the keys are valid
                push @vals, _get_val_by_aref($self, $data_line, $field, $args); # make list of requested stat values
            }
            else {
                push(@invals, $field);
            }
        }
    }
    croak 'Invalid statistic: ', join(',', @invals) if scalar(@invals);
    if (!$old_ref) { # return unreferenced list or single value
       return scalar(@vals) > 1 ? @vals : $vals[0];
    }
    elsif ($old_ref == 1) { # return flat referenced list
       return \@vals;
    }
    else { # return keyed list
       return { mesh(@{$fields_wanted}, @vals) }; # make list of key => value pairs naming each stat value (e.g., FSG => 0.20)
    }
}

sub _return_list {
    my ($href, $data, $ref, $sortby) = @_;
    
    my $alpha_sort = $sortby && $sortby eq 'alpha' ? 1 : 0;
    
    if (!$data) { # only need the keys from %sel - but as a reference or not, and sorted or not?
        my @res = $alpha_sort ? sort{$a cmp $b} keys %{$href} : keys %{$href};
        return $ref ? \@res : @res;
    }
    elsif ($data == 1) { # expecting lines of data as an array - but as a reference or not, and sorted or not?
        my @res = ();
        if ($alpha_sort) {
            push @res, $href->{$_} foreach sort{$a cmp $b} keys %{$href};
        }
        else {
            @res = values %{$href};
        }
        return $ref ? \@res : @res;
    }
    else { # expecting a hash - but as a reference or not?
        return $ref ? $href : %{$href};
    }
}

sub _is_valid_fieldname {
    my ($self, $field) = @_;
    my $ok = 1;
    if ( none {$field eq $_} @{$self->{'_field_aref'}}) {
        #carp __PACKAGE__, " does not recognize the requested statistic '$field'";
        $ok = 0;
    }
    return $ok;
}

# Null fields are those that are NOT empty but do not have a valid character:
sub _isnullcell {
    return hascontent($_[0]) && (!looks_like_number($_[0]) && $_[0] !~ /[A-Z]/) ? 1 : 0;
}

# Try to find a word among the cues or the associates, returning a ref to the line, and an addon for the stat indices:
sub _get_single_entry {
    my ($self, $word, $data_line) = @_;
    if ($data_line = $self->find_associate($word, 1, 1)) {
        $self->{'_iadd'} = 5;
    }
    elsif ( my $cues = $self->list_cues($word, {data => 1, ref => 1}) ) {
        $data_line = $cues->[0]; #only need the first word to get the "target" data
        $self->{'_iadd'} = 14;
    }
    else {
        return undef; # word could not be found in the database
    }
    return $data_line;
}

sub _get_pair_entry {
    my ($self, $w1exp, $w2exp, $aref) = @_;
    $self->{'_iadd'} = 5 if $aref = $self->find_pair($w1exp, $w2exp);
    return $aref;
}

sub _get_val_by_aref {
    my ($self, $aref, $field) = @_;
    return 0 if ! $aref;
    return  $aref->[(firstidx {$_ eq $field} @{$self->{_field_aref}}) + $self->{'_iadd'}];
}

sub _calc_nsg {
    my ($self, $words, $args) = @_; 
    $args->{'mweight'} = 1 if nocontent($args->{'mweight'});
    $args->{'calc'} = 0 if nocontent($args->{'calc'});
    # Get "self-associative" strengths of the two words:
    my ($cue_11, $cue_22, $cue_21, $tgt_12, $med_sum, $nsg, @meds) = (1, 1, 0, 0, 0, 0); # assumed defaults
    # Get forward and backward strengths between the two words:
    $cue_21 = $self->assoc_stat($words->[1], $words->[0], 'FSG', {calc => $args->{'calc'}});
    $tgt_12 = $self->assoc_stat($words->[0], $words->[1], 'FSG', {calc => $args->{'calc'}});
    # Get cross-mediator strengths:
    @meds = $self->list_mediators($words->[0], $words->[1], data => 0, ref => 0);
    foreach my $m(@meds) {
        $med_sum +=  $self->forward_strength($words->[0], $m) * $self->forward_strength($words->[1], $m);
    }
    # Calc net-strength by addition of FSGs and MSGs:
    $nsg = $cue_21 + $tgt_12 + ( $med_sum * $args->{'mweight'} );
    return $nsg;
 }
 
 sub _calc_tsg {
    my ($self, $words, $args) = @_;
    my $fsg = $self->assoc_stat($words->[0], $words->[1], 'FSG', {calc => $args->{'calc'}});
    my $ssz2 = $self->word_stat($words->[1], 'SSZ', {calc => $args->{'calc'}});
    my $tsg = '';
    $tsg = $fsg / ($ssz2 * $fsg) if looks_like_number($fsg) and looks_like_number($ssz2) and $fsg > 0 and $ssz2 > 0;
    return $tsg;
}
 
 sub _calc_orthon_level {
    my ($w1, $w2, $statn, $weights, $val) = @_;
    given($statn){
        when(/^levd/i){
            if (ref $weights and scalar(@{$weights} == 3)) {
                require Text::WagnerFischer;
                $val = Text::WagnerFischer::distance($weights, $w1, ($w2)); 
            }
            else {
                require Text::LevenshteinXS;
                $val = Text::LevenshteinXS::distance($w1, $w2);
            }
        }
        when(/^mui/i){
            require String::Similarity;
            $val = String::Similarity::similarity($w1, $w2);
        }
        default{say "Don't know foo"; }
    }
    return $val;
}

sub _path2table4letter { # simple access to db files, assuming each is named by letter; 
    # used internally only by methods that guarantee $_[0] (after shift) is defined as a single uc letter to search
    # $self->{'dbdir'} already validated in new() so no need to check its value/existence or of the files within it
	my $self = shift;
    return undef if $_[0] !~ /^[A-Z]/;
    return File::Spec->catfile($self->{'dbdir'}, $_[0] . '.csv') || undef; # BENCHMARK: about 7% faster than regexp
}

sub _this_assoc { # isolate the associate in this line (as string) of data:
    return /^[^,]+,\s*([^,]+),/; # isolate the associate
}

sub _hashify_line {
    my ($line) = @_;
    my ($i, %hash) = ();
    my @fields = _headings_list();
    for ($i = 0; $i < scalar @fields; $i++) { # note: this is much faster than using "mesh" (which is 84% slower)
        $hash{$fields[$i]} = $line->[$i];
    }
    return \%hash;
}

# Initialize the array of single words ($self->{'_ex_singles'} and list of paired regexes ($self->{'_ex_pairs'}) that will be excluded from list and calc methods:
sub _init_ex_words {
    my ($self, $args) = @_;
    
    $self->{$_} = [] for qw/_ex_pairs _ex_singles/;
    $self->{'_any_exclusions_on'} = 0;
    my ($u, $v) = ();
    require File::Basename; # just to get to the next level up ...
    foreach my $file(qw/cultural errors misc phrasal structural/) {
        my $load = defined $args->{'ex_' . $file} ? $args->{'ex_' . $file} : ($self->{'exclusions'}->{$file} || 0);
        if ($load) {
            my $path = File::Spec->catfile($INC[0], 'excludes', 'ex_' . $file . '.csv'); # should catch on install
            $path = File::Spec->catfile($INC[0], 'Lingua', 'Norms', 'USF', $path) if !-e $path; # should catch once installed
            open F, '<' . $path or warn $! . " $path";
      	    while (<F>) {
                next if /^#/;
                chomp;
                next if nocontent($_);
         		($u, $v) = split/,/;
                if ( nocontent($v) ) { # a single word to be excluded regardless of how it is paired
                    push @{$self->{'_ex_singles'}}, qr/^$u$/i;
                }
                else {
      		        push @{$self->{'_ex_pairs'}}, [qr/^$u$/i, qr/^$v$/i];
                }
           	}
            close F;
            $self->{'_any_exclusions_on'}++;
        }
        $self->{'exclusions'}->{$file} = $load;

    }
    # tried but don't seem reliable:
    #require Text::Match::FastAlternatives;
    #$self->{'_ex_singles_tm'} = Text::Match::FastAlternatives->new(@{$self->{'_ex_singles'}});
    #$self->{'_ex_pairs_tm'} = Text::Match::FastAlternatives->new(@{$self->{'_ex_pairs'}});
    #require String::BlackWhiteList;
    #$self->{'exclusions_matcher'} = String::BlackWhiteList->new(
    #    blacklist => $self->{'_ex_singles'},
    #    whitelist => [],
    # )->update;
    # say $self->{'exclusions_matcher'}->black_re();
}

sub _cull_ex_words { # search is case-insensitive for pairs - as per _init_ex_words
    my ($self, $w1, $w2) = @_;
    return 0 if !$self->{'_any_exclusions_on'} || nocontent($w1);
    #if ($self->{'_ex_singles_tm'}->match($w1) or ( hascontent($w2) and $self->{'_ex_singles_tm'}->match($w2))) {
    #    return 1;
    #}
    
    #if (!$self->{'exclusions_matcher'}->valid($w1) or ( hascontent($w2) && !$self->{'exclusions_matcher'}->valid($w2) ) ) {
        #print "would return;\n"
    #    return 1;
    #}
    #if ($self->{'_ex_singles_tm'}->match($w1) or ( hascontent($w2) and $self->{'_ex_singles_tm'}->match($w2) )) { # say "rejecting $w1 given $_";
    #       return 1;
    #}

   foreach (@{$self->{'_ex_singles'}}) {
        if ($w1 =~ $_ or ( hascontent($w2) and $w2 =~ $_ )) { # say "rejecting $w1 given $_";
           return 1;
        }
    }

    if (hascontent($w2)) {
    	foreach (@{$self->{'_ex_pairs'}}) {# BENCHMARK: - cost only if using: study $w1
             return 1 if ($w1 =~ $_->[0] && $w2 =~ $_->[1]) || ($w1 =~ $_->[1] && $w2 =~ $_->[0]);
        }
	}
	return 0;
}

sub _set_word_pairs {
    croak __PACKAGE__, '::', (caller(1))[3], ' needs 2 words for associating' if nocontent($_[0]) || nocontent($_[1]); # flawed if $ari[0]
    return (uc($_[0]), uc($_[1]));
}

# Setting arguments as a referenced hash array:
# args can be sent as hashrefs, or open key=>value arrays:
sub _set_args {
    do { ref $_[0] ? return $_[0] : !(@_ % 2) ? return {@_} : croak( (caller(1))[3], " needs a valid list of arguments, not '@_'")};
}

package Lingua::Norms::USF::Statistics;

sub new {
    my ($class, %args) = @_;
	my $self = {};
	bless $self, $class;
	return $self;
}

return(1);

__END__

=head1 ERROR MESSAGES

All errors are responded to as L<croaks|Carp::croak>, dieing as a result. The basic text strings that are "croaked" are as follows.

=over 2

=item Invalid statistic

This croak can be raised when directly calling L<word_stat|word_stat> or L<assoc_stat|assoc_stat>. It means you haven't sent a valid statistic code to these methods. Valid codes are listed in the sections L</Statistics for single words> (e.g., 'SSZ', 'FRQ') and L</Statistics for word-pairs> (e.g., 'FSG', 'MSG'). Note that L<word_stat|word_stat> expects this code to be provided as a flat string (or as an array-reference of codes) after the word to look up, and L<assoc_stat|assoc_stat> expects this code or codes (as an array-reference) after giving the two words to associate. The particular invalid code or codes sent are firstly "carped" before this terminal croak.

=item requires a valid path to the database directory

=item found an empty directory where the database should be: '...

=item did not find all the required database tables in the directory '...

=item could not read the following database file(s) in the directory '...

These are all croaks that could arise at the start, when using L<new|new>, if the argument B<dbdir>, indicating where the database files are stored, is not specified, cannot be determined, is incomplete or contains unreadable files. Inspect the value of the given or tested path that is returned with the croak. Run the install script if necessary. 

=item cannot determine the database location

This arises when any of the list_, find_ and random_ methods are called directly, or from all the other methods that rely upon them, and the database directory has not been resolved, as above.

=item needs a word for searching associates/looking up its cues I<(or similar)>

This is a croak that follows when trying to list associates, cues, orthons and so on, and the word or words to look up has/have not been specified as the first (and second, after the class) argument.

=item needs a valid list of arguments, not '...'

This croak follows if the arguments sent to any method are not referenced (as an array or a hash, where relevant), or do not form one or more key=>value pairs of arguments that can form a valid hash-reference. The method that was called with these invalid arguments prepends this croak.

=back

=head1 EXAMPLES

 my $usf = Lingua::Norms::USF->new(dbdir => 'c:\perl\site\lib\lingua\norms\usf\db'); # likely location on Windows

=over 2

=item How many cues are there in the DB?

say scalar $usf->list_words(cues_only => 1); # should print 5018 - assumes ref => 0 and data => 0

=item How many words in the DB of 2-3 letters in length?
 
say scalar $usf->list_words(chars => [2, 3]); # prints "483\n" 
say scalar $usf->list_words(chars => [2, 3], cues_only => 1); # prints "312\n"
say $_ for $usf->list_words(chars => [2, 3], sortby => 'alpha'); # prints out each of the 483 words on a new line in alphabetical order

=item Make up your own functions:

Here's one to just give a yes/no answer as to whether two words have a bi-directional association.

$bool = is_bidirectional('horse', 'cow'); 

sub is_bidirectional {
    my ($w1, $w2) = @_;
    require Lingua::Norms::USF;
    if (Lingua::Norms::USF::are_related($w1, $w2) and Lingua::Norms::USF::bsg($w1, $w2)) {
        return 1;
    }
    else {
        return 0; 
    }
}

(or just see what Lingua::Norms::USF::assoc_stat($w1, $w2, 'BSG') returns).

=back

=head1 REFERENCES

Coltheart, M., Davelaar, E., Jonasson, J. T., & Besner, D. (1977). Access to the internal lexicon. In S. Dornic (Ed.), I<Attention and performance> (Vol. 6, pp. 535-555). London, UK: Academic.

Kucera, M., & Francis, W. (1967). I<Computational analysis of present-day American English>. Providence, RI, US: Brown University Press.

McNamara, T. P. (1992). Theories of priming: I. Associative distance and lag. I<Journal of Experimental Psychology: Learning, Memory and Cognition>, I<18>, 1173-1190.

Nelson, D. L., Bennett, D. J., & Leibert, T. W. (1997). One step is not enough: Making better use of association norms to predict cued recall. I<Memory and Cognition>, I<25>, 785-796.

Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free association, rhyme, and word fragment norms. I<Behavior Research Methods, Instruments, and Computers>, I<36>, 402-407.

Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery, and meaningfulness values for 925 words. I<Journal of Experimental Psychology Monographs>, I<76>, 1-25.

Toglia, M. P., & Battig, W. F. (1978). I<Handbook of semantic word norms>. Hillsdale, NJ: Erlbaum.

Twilley, L. C., Dixon, P., Taylor, D., & Clark, K. (1994). University of Alberta norms of relative meaning frequency for 566 homographs. I<Memory and Cognition>, I<22>, 111-126.

=head1 SEE ALSO/DEPENDENCIES

L<File::RandomLine|File::RandomLine> - Used by L<random_word|random_word> to randomly retrieve one or more lines from a database table.

L<Lingua::EN::Syllable|Lingua::EN::Syllable> - Used by L<list_words|list_words> to determine syllable size, if specified. Note: this module's manpage states that "The syllable count provided in Lingua::EN::Syllable is about 90% accurate". It particularly fails to count syllables formed by suffixes such as "dle," "kle" and "tle" (e.g., "cradle" is given as 1 syllable), and when a "y" is followed and preceded by a vowel (e.g., "voyage" is given as 1 syllable).

=head1 AUTHOR

=over 2

=item Roderick Garton

rgarton AT cpan DOT org

=back

=head1 BUGS

Please report any bugs or feature requests to C<bug-lingua-norms-usf-0.01 at rt.cpan.org>, or through the web interface at L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Lingua-Norms-USF-0.01>. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

=head1 TODO

A cue-iterating next() method, returning each cue/line between iterations.

Cache last single word tested, and association and other single-word lists retrieved, until this test word changes in another call.

Partial out into separate packages: ... USF::Statistics, USF::List, USF::Database, etc..

Other orthographic neighbourhood rules, but transfer to a specific module

Setting other global arguments in C<new> or otherwise (as for B<dbdir>)?

Test performance with other file methods. 

C<isaword> function on top of use of hascontent and nocontent?

C<assoc_tree> method for dumpable list of one-, two-, etc. step associates, according to the "list_words" arguments?

Access to other USF databses?

Set::Light and similar to improve speed/memorycost?

Options for randomness source

=head1 SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Lingua::Norms::USF

You can also look for information at:

=over 2

=item * RT: CPAN's request tracker

L<http://rt.cpan.org/NoAuth/Bugs.html?Dist=Lingua-Norms-USF-0.01>

=item * AnnoCPAN: Annotated CPAN documentation

L<http://annocpan.org/dist/Lingua-Norms-USF-0.01>

=item * CPAN Ratings

L<http://cpanratings.perl.org/d/Lingua-Norms-USF-0.01>

=item * Search CPAN

L<http://search.cpan.org/dist/Lingua-Norms-USF-0.01/>

=back

=head1 LICENSE AND COPYRIGHT

Copyright 2011 Roderick Garton.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation.

See http://dev.perl.org/licenses/ for more information.

=over 2

=item Disclaimer

To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.

=back

=cut

1; # End of Lingua::Norms::USF
