Sostenitori recenti

Someone купує coffee .
Thanks for your tutorial on the Python search engine!

Someone купує coffee .

Kris Kulivnyk купує coffee .

Someone купує coffee .

Michel Laliberté купує 3 coffees .
Hi Bart, Congratulations for your clever Python searchengine, I'm impressed. I thought indexers had to be written in a C-like compiled language for speed reasons. I wish to report a small error (and correction) in the README.md. See attached file searchengine_ipython_dump.txt. {My comments between curly brackets}. I had to install VB librairies and IPython to get a useful output. It does work all right, but I am kind of frustrated because I feel like an aboriginal hunter inspecting an opened iPhone. A few words about myself : I am a retired professional translator AND a self-taught programmer of the worst kind, not the type sofware engineers usually like to associate with. Back in the Nineties, I ported Mark Zimmerman's KWIC (Keyword in Context) C source code from Macintosh to PC to develop a rustic DOS KWIC interface. That interface involved three different, but related screens : 1) an alphabetic view of the indexed words dictionary content; 2) a listbox filed with a one-line-per-file view of all occurrences of any keyword or keyword combination, centered on the keyword(s); 3) a view of the full-text source file centered around the keyword for every file listed in the above listbox. See examples for no. 2 (pink) and 3 (blue frame) screens in attached file. Keyword "colonel" was grepped in a large corpus of text files. I am now developping a pure Python editor with some word processing features and a KWIC-like interface for multidirectory searches for keywords. It uses a grep-like command to search for keywords in various unindexed text files (including html and xml files) and places the result in one-liners and full-text screens like 2) and 3) above. The speed is good enough with files up to 1 Mo size. For larger files or for Web scraping, an indexer would be welcome. Presently, your index hogs a large part of my 12 Go RAM, and I suspect it wouldn't even load in older 8 Go RAM computers. I would very much like to run some tests on it without having to reindex the gz file after closing a session. Is there an easy way to save the index? Any help or advice would be appreciated. Michel Laliberté Montreal mlibrt-at-gmail-dot-com p.s.: Sorry, no way to upload the attachments, but I will if you send me an email address. I closed my Twitter account for personal reasons.