python - select randow row from cassandra -
i have following table:
create table prosfiles ( name_file text, beginpros timestamp, humandate timestamp, lastpros timestamp, originalname text, pros int, uploaded int, uploader text, primary key (name_file) ) create index prosfiles_pros_idx on prosfiles (pros);
in table keep location of several csv files wich processed python script, have several scripts running @ same time processing files, use table keep control , avoid 2 scripts start processing same file @ same time (in 'pros' colum 0 means file has not being processed, 1 processed files , 1010 files being processed script)
each file runs following query pick file process:
"select name_file prosfiles pros = 0 limit 1"
but returns first row of files condition
i run query returns randow row ones pros = 0.
in mysql i've used "order rand()" in cassandra don't know how random sort results.
looks you're using cassandra queue , it's not best usage pattern it, use rabbitmq/sqs/any-other-queue-service. cassandra not support sorting @ all, , it's done idea that:
- sort require lot of computations inside database if trying sort 1b of rows.
- sort not easy task in distributed environment: have ask nodes holding data perform it.
but if know doing, can revisit database schema more suitable type of workload:
- split source table 2 different tables: first 1 full file information , second 1 queue containing ids of files process.
- your worker process reads random row
queue
table (see below how read ~random row cassandra primary key) - worker deletes target id
queue
, updates targets table processing information.
this way of doing things lead possible errors:
- multiple workers can same target @ once.
- if have lot of workers , targets, cassandra's compaction process kill performance of diy queue.
to read pseudo-random row table it's primary key can use query: select * some_table token(id_column)>some_random_long_value limit 1
, have it's cons:
- if have small set of targets, sporadically return empty result because
some_random_long_value
higher token of existing key.
Comments
Post a Comment