cp target/release/libinstant_distance.so instant-distance-py/test/instant_distance.so
and it works. Built and running. The main tree was MacOS only.
Here's resource consumption in a sample run.
Time: 4.49s, Memory: 1552 mb.
Single word. Three langs including en.
cargo check and format time might also slightly improve!
Quick glance in this case: took a couple second snapshot on the Performance tab and saw a lot of React related calls.
> Language: fr, Translation: bonjours
> Language: fr, Translation: bonsoir
> Language: fr, Translation: salutations
> Language: it, Translation: buongiorno
> Language: it, Translation: buonanotte
> Language: fr, Translation: rebonjour
> Language: it, Translation: auguri
> Language: fr, Translation: bonjour,
> Language: it, Translation: buonasera
> Language: it, Translation: chiamatemi
Is it just me or these machine translations are worse than ... Google Translate?
The word vectors have been aligned in multiple languages. Using an approximate nearest neighbor search we are able to find the nearest vector to the input in multiple languages very quickly.
To keep the example simple, we did not try to filter the data through hand-built language dictionaries. In fact, we simply drop words in other languages that also appear in the English .vec file. Words like "ciao" appear frequently enough in otherwise English sentences that the example code drops it from Italian, and so is not shown in the results:
% curl -s "https://dl.fbaipublicfiles.com/fasttext/vectors-aligned/wiki..." | grep -n ciao 50393:ciao 0.0120 ...
One improvement would be to filter out any words that do not appear in a hand-curated dictionary instead of filtering out words that already appear in English. We decided not to show how to do this because we'd already introduced a few concepts, like aligned word vectors, approximate nearest neighbour searches, and wanted to keep the example as simple as possible.
Full genome of COVID-19 is available:
https://www.snapgene.com/resources/coronavirus-resources/?re...
It is probably the first thing that was done once the COVID-19 genome was made public. A quick googling gave me that summary of the results: https://www.news-medical.net/health/How-Does-the-SARS-Virus-...
BLAST (=Basic Local Alignment Search Tool) is one common version, and the NIH'S NCBI has a variety of nice online tools here: https://blast.ncbi.nlm.nih.gov/Blast.cgi
Note that it does take a little bit of background knowledge to interpret:some motifs are just really common, others are shared.
The short text and that fact that your application would tolerate or celebrate catchy neologisms plays to fasttext's strengths.
Only as an adverb, it should be "rapide" otherwise.