java - Regex: Find first occurence and map to canonical value -
i have input data this:
1996 caterpiller d6 dozer sale (john deere , komatsu too!)
i want match first brand name found , map canonical value.
here's map:
canonical regex komatsu \bkomatsu\b cat \bcat(erpill[ae]r)?\b deere \b(john )?deere?\b
i can test brand in string:
/\b(cat(erpill[ae]r)?|(john )?deere?|komatsu)\b/i.exec(...) != null
or first match was:
/\b(cat(erpill[ae]r)?|(john )?deere?|komatsu)\b/i.exec(...)[0]; //caterpiller
but there fast or convenient way map first match real value want?
caterpiller => cat
do need find first match, test against patterns in map?
i need 10,000+ inputs against 10,000+ brands :d
i loop the map, testing against input value, find first value appears in map, not input.
an idea consists associate number of capture group index in canonical name array. each different brand must have own number:
var can = ['', 'komatsu', 'cat', 'deere']; // ^idx1 ^idx 2 ^idx 3 var re =/\b(?:(komatsu)|(cat(?:erpill[ae]r)?)|((?:john )?deere))\b/ig; // ^ 1st grp ^ 2nd grp ^ 3rd grp var text = '1996 caterpiller d6 dozer sale (john deere , komatsu too!)'; while ((res = re.exec(text)) !== null) { (var i=1; i<4; i++) { // test each group until 1 defined if (res[i]!= undefined) { console.log(can[i] + "\t" + res[0]); break; } } } // result: // cat caterpiller // deere john deere // komatsu komatsu
Comments
Post a Comment