Escepticisimo Ilustrado: Curiosidades del split en Java

Buenas a todos!

Estuve el otro día peleando con el método split de Java, y me topé con una situación que no me había dado problemas... hasta que me los dio :P

Así pues, me decidí a darle una vuelta a todo, y encontré una solución que me gustaría compartir con vosotros.

El problema en si, viene a ser que cuando hacemos un split de un String en Java, nuestra "aguja" se pierde... es decir

Aguja: "palabra"
Pajar: "Tengo una palabra, dos palabras, tres palabras

Al hacer nuestro split, obtenemos como retorno el objeto String[] = {"Tengo una ", ", dos ", "s, tres ", "s"};

Pero... ¿y si quiero recuperar mi aguja inicial?

Pensaréis que es una tontería, ya que la aguja es "palabra", y basta con ir intercalando la aguja entre las piezas del retorno... ¡pues resulta que hay una pequeña excepción!

Y es que, resulta que Java puede trabajar con expresiones regulares (regexp), lo cual hace que nuestra aguja pueda tener valores muy diversos.

En mi caso, estaba intentando jugar con los printf de C, así que pensé en detectar el siguiente regexp:

"%'\\S+:\\S+'"

Esto atiende a todas las posibles formas de %'(cualquier texto):(cualquier texto)'; es decir, por ejemplo:

%'i:int1'

Para este caso, necesitaba partir (split) las líneas de texto, y necesitaba recuperar los datos que se habían ido por el camino, para poder procesarlos.

Existe una alternativa, pero solo funciona con regexp sencillos: colocar un "?<=" antes del regexp para que el texto eliminado se quede al final de cada pieza (y por algún motivo, no funcionaba con mi regexp).

Así que, me puse a ello, y cree un split que mantiene los textos eliminados.

[code lan=gral] import java.util.ArrayList; /** * SplittedString class * * Class used to have an alternative type of split in which the return contains * all the pieces, including the parts that consists on the given regexp. * * @author DoHITB * */ public class SplittedString{ /** * Split method. * * Main method. It does the split of the given string. * @param text: text to be splitted * @param raw: the regexp to be splitted * @return an array containing all pieces. */ public static final String[] split(String text, String raw){ //return var ArrayList ret = new ArrayList(); //it indicates if there's need to call treatpieced boolean pieced = false; //make a "normal" split String pieces[] = text.split(raw); if(pieces.length == 1) /* * there's a line that contains text with no regex * -- or -- * there's a line that contains only a regex */ if(pieces[0].equals(text)) //there is no regex ret.add(text); else //temp <> pieces, so there's a regex. pieced = true; else if(pieces.length > 1) /* * there's a line that contains regex + text + regex * -- or -- * there's a line that contains text + regex + text */ pieced = true; else /* * there's a line that only contains regex */ ret.add(text); if(pieced) ret = treatPieced(pieces, text); return (String[]) ret.toArray(new String[ret.size()]); } /** * TreatPieced method * * It searches the begining and end of each "normal" split pieces and then * retrieves the missing pieces from the original text. * * @param pieces: array given by "normal" split * @param text: the original String to be splitted * @return: an ArrayList containing all pieces */ private static final ArrayList treatPieced(String[] pieces, String text){ //return var ArrayList ret = new ArrayList(); //actual position from text int l = -1; //where the text chonk starts int start = 0; //where the text chonk ends int end = 1; //indicator for a special operation on first iteration boolean first = true; //piece length. Declared outside for to avoid create it each time int pl = 0; //the middle pieces that "split" eliminates ArrayList middle = new ArrayList(); //for each piece for(int i = 0;i < pieces.length;i++){ if(pieces[i].isEmpty() && i == 0){ //this means it start with regex, then text first = false; continue; } pl = pieces[i].length(); //this will put "l" on the start of "pieces[i]" while(!text.substring(++l, (l + pl)).equals(pieces[i])); /* * on first part, if l == 0 means that temp start with text * if l > 0, it starts with a variable */ if(first){ /* * if end > start --> we're starting to search. Store the value * if end < start --> we already have a start. Store the value */ if(end > start){ start = l + pl; l = start; }else end = l; //Once we have done a pair, store the middle value. if(end > start){ middle.add(text.substring(start, end)); start = end + pl; } }else{ //On the first iteration, make special treatment if(i == 1){ start = 0; end = l; middle.add(text.substring(start, end)); start = end + pl; }else{ /* * if end > start --> we're starting to search. Store it * if end < start --> we already have a start. Store it */ if(end > start){ start = l + pl; l = start; }else end = l; //Once we have done a pair, store the middle value. if(end > start){ middle.add(text.substring(start, end)); start = end + pl; } } } } //if we have an unfinished matching, it an end-var. if(start < text.length()) middle.add(text.substring(start)); //At this point, we have a shuffle of pieces, middle, pieces, ... Object[] midA = middle.toArray(); /* * first = true: text + var + text + var + ... * first = false: var + text + var + text + ... */ if(first){ //merge pieces and ids. for(int i = 0;i < pieces.length;i++){ ret.add(pieces[i]); if(i < midA.length) ret.add((String) midA[i]); } }else{ //merge pieces and ids. for(int i = 0;i < midA.length;i++){ ret.add((String) midA[i]); if(i + 1 < pieces.length) ret.add(pieces[i + 1]); } } return ret; } } [/code]

Con esto, podemos obtener un String[] que contenga todos los fragmentos de nuestro split, incluyendo los fragmentos que un split "normal" eliminaría.

Espero que en algún momento os sea útil, o bien que pueda serviros para alguna idea similar... o simplemente como inspiración para algo!!

Nos vemos!

Escepticisimo Ilustrado

Alters

Curiosidades del split en Java

No hay comentarios:

Publicar un comentario